^ v A c^ 3 P “ TG ^^ SCHOOL 



NAVAL POSTGRADUATE SCHOOL 

Monterey, California 






A PRELIMINARY ANALYSIS OF HUMAN FACTORS 
AFFECTING THE RECOGNITION ACCURACY OF A 
DISCRETE WORD RECOGNIZER FOR C3 SYSTEMS 

by 

Howard William Yellen 
March 1983 

Thesis Advisor: G. K. Poock 



Approved for public release; distribution unlimited. 



T207864 



LIBRARY, NAVAL POSTGRADUATE SCHOOL 



SECURITY CLASSIFICATION of This PACE r*h»n Dete Entered) MONTEREY, C A 93940 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


TT report number 


2. GOVT ACCESSION NO. 


3. RECIPIENT’S CATALOG NUMBER 


4. title (md Subtitle) 

A Preliminary Analysis of Human Factors Affectinc 
the Recognition Accuracy of a Discrete Word 
Recognizer for C3 Systems 


5. TYPE OF REPORT A PERIOD COVERED 

Master's Thesis; March 1983 


6. PERFORMING ORG. REPORT NUMBER 


7. author.; 

Howard Will iam Yellen 


8. CONTRACT OR GRANT NUMBERS) 


*. performing organization name ano aoorsss 

Naval Postgraduate School 
Monterey, California 93940 


10. PROGRAM ELEMENT. PROJECT, TASK 
AREA A WORK UNIT NUMBERS 


1 1. controlling office name ano address 

Naval Postgraduate School 
Monterey, California 93940 


12. report date 

March 1983 


13. NUMBER OF PAGES 

190 


M. MONITORING AGENCY NAME 6 ADDRESS^* different Itoon Controlling Office) 


IS. SECURITY CLASS, (of thle report) 

Unclassified 


15«. OECLASSI F| CATION/ DOWNGRADING 
SCHEDULE 



IS. DISTRIBUTION STATEMENT (ot thle Report) 



Approved for public release; distribution unlimited. 



17. DISTRIBUTION STATEMENT (ot the ebeteect entered In BU ck 70, It dllferent trom Report) 



IB. SUPPLEMENTARY NOTES 



IS. KEY WOROS (Continue on *trw«i elde If necoooory «n4 Identity by block number) 

Voice Recognition 
Human Factors 

Automatic Speech Recognition 
Statistical Significance 

20. ABSTRACT (Continue on reveree elde It neceeeeey end Identity by block numbor) 

Literature pertaining to Voice Recognition abounds with information 
relevant to the assessment of transitory speech recognition devices. 

In the past, engineering requirements have dictated the path this 
technology followed. But, other factors do exist that influence 
recognition accuracy. This thesis explores the impact of Human Factors 
on the successful recognition of speech, principally addressing the 
differences or variability among users. A Threshold Technology T-600 



DD I JAN 73 1473 EOITION OF 1 NOV 65 IS OBSOLETE *J 
S/N 0102- LF- 0 1 4- 6601 



SECURITY CLASSIFICATION OF THIS PAGE Dmtm K.uetea 



SECURITY CLASSIFICATION OF THIS PAGE Dmf Enfr*d) 



was used for a TOO utterance vocabulary to test 44 subjects. A 
statistical analysis was conducted on 5 generic categories of Human 
Factors: Occupational, Operational, Psychological, Physiological 

and Personal. How the equipment is trained and the experience level 
of the speaker were found to be key characteristics influencing 
recognition accuracy. To a lesser extent computer experience, time 
of week, accent, vital capacity and rate of air flow, speaker 
cooperativeness and anxiety were found to affect overall error rate. 



2 

SECURITY CLASSIFICATION OF THIS PAGE D««« Enfrtd) 



S - N 0102- LF- 014- 6601 



Approved for public release; distribution uniirriiea. 



A Preliminary Analysis of human factors Affecting The 
Recognition Accuracy of a Discrete Word Recognizer 

For C2 Systems 



by 



Howard William Yenen 
Captain, United States Army 
h. A., Temple University, 197H 



Submitted in partial fulfillment of the 
requirements for the degree of 



MASTER Cj? SCIENCE IN SYSTEMS TECHNOLOGY 
l COMMAND , CONTROL, AND COMMUNICATIONS ) 



from the 



NAVAL POSTGRADUATE SCHOOL 
March 1962 



/ J7 J 

c./ 













ABSTRACT 



Literature pertaining tc Voice Recognition aocunas with 
information relevant to the assessment of transitory speech 
recognition devices. In the past, engineering requirements 
have dictated tne path this technology followed. But, other 
factors do enst that influence recognition accuracy. This 
thesis explores the impact of Human factors on tne 
successful recognition of speecn, principally addressing the 
differences or variability among users. A Threshold 
Technology T-C00 was used for a 100 utterance vocabulary to 
test 44 subjects. A statistical analysis was conducted on b 
generic categories of Human factors: Occupational, 
Operational, Psychological, Physiological and Personal. How 
the equipment is trained and the experience level of the 
speaker were found tc be key characteristics influencing 
recognition accuracy. To a lesser extent computer 
experience, time of week, accent, vital capacity and rate of 
air flew, speaker cccperativeness and anxiety were found to 
affect overall error rates. 



4 



TABLE 0 5 CONTENTS 



I. INTRODUCTION 

II. CCM1FUTSR RECOGNITION CE SPEECH 

A. OVERVIEW OE VOICE INPUT TECHNOLOGY 

e. the value oe speech recognition 

1. Advantages cf Speech Recognition . . 

2. Lirritations or Speech Recognition. 

C. APPLICABILITY OE COMPUTER RECOGNITION 
OF SPEECH 

1. Ccrrrrercial Applications 

2. Military Applications 

III. HUMAN FACTORS IN SPEECH RECOGNITION 

A. LEl'IN IT ION AND PURPOSE 

e. factors affecting recognition accuracy 

1. General 

2. Differences Between Speakers...... 

3. Differences Within Speakers 

4. Miscellaneous Factors 

17. DESCRIPTION OF THE EXPERIMENT 

A. OBJECTIVES AND CONSTRAINTS 

1. Objectives 

a. Occupational Characteristics., 
c. Operational Cnarac teristics . . . 
c. Personal Character! si tcs 



14 

18 

18 

26 

27 

py 

Z2 

22 

34 

40 

40 

41 

41 

44 

48 

5z 

53 

K 'Z 

w u 

53 

53 

54 

55 



c; 



a. Physiological Characteri si ics 56 

e. Psychological Characterist ics 5? 

2 . Constraints... 58 

£. SUBJECTS hy 

C. EQUIPMENT 60 

1. Voice Recognition System 60 

2. Spirometer 6? 

2. Feah Flew Meter 70 

4. Tape Recorder 12 

D. INSTRUMENTATION 72 

1. User Questionnaire #1 74 

2. User Questionnaire #2 74 

3. STAI Questionnaire 75 

S. EXPERIMENTAL DESIGN 76 

i . PROCEDURE 76 

1 . Training 76 

2. Recognition Testing 79 

2 . Voca du lary 80 

G. VARIABLES £0 

V. ANALYSIS AND RESULTS 82 

A. GENERAL 62 

£. OCCUPATIONAL CHARACTERISTICS 84 

1. Hypotheses £4 

2. Job Function 85 

3. Eranch of Service 87 

4 . Job ana Serice Satisfaction 89 



6 



5. Previous Computer Experience 91 

6. foreign Language Competency 92 

C. OPERATIONAL CHARACTERISTICS 94 

1. hypotheses 94 

8, Retnoa of Training 95 

5. Time of Day ana WeeK 97 

4. User Experience 98 

5. Ease of Use 121 

D . PERSONAL CHARACTERISTICS 128 

1 . Hypotheses 1 28 

8. Race 124 

5. 2arital Status ana iarrily Size 124 

e. Religious Preference 126 

6 . Accent 127 

c. Place of Birth ana ideographic Origin 128 

7. level of Eauceticn 112 

8. Socio-economic Class 118 

9 . Dental 112 

E. rHISICIOGiCAL CHARACTERISTICS 114 

1. Hypotheses 114 

8. Age 115 

2. Height and Weight 116 

4. Vital Capacity ana Rate of Air Elow 118 

5. Physical Condition 188 



7 



I. PSYCHOLOGICAL CHARACTERISTICS 124 

1. Hypotneses 124 

2. PsycLoicgicai Anxiety 124 

2. Speaker Coopera ti veness 129 

4. Recognition Errors 151 

5. Attitudes Toward Tne Use of Voice 122 

6. Attitude Toward Computers and Information 

Processing 126 

G . VCCAiULABY ERRORS 128 

VI. CONCLUSIONS 141 

APPEAL IR A: USER QUESTIONNAIRE #1 147 

APPENDIX £: USER QUESTIONNAIRE #2 156 

APPENDIX C: S ZIP -EVALUAT ION QUESTIONNAIRE 161 

APPENDIX E: SEIE-EV aLUAT ION QUESTIONNAIRE 164 

APPENDIX E: UTTERANCE LIST: TRAINING WEEK - WEEK.# 1 167 

APPENDIX E : UTTERANCE LIST: WIEK#2 170 

APPENDIX G: UTTERANCE LIST: WESK#2 172 

APPENDIX E: DATA COLLECTION FORM 176 

APPENDIX I: PIASTER LIST 05 UTTERANCES 1S1 

APPENDIX J: INDIVIDUAL SUBJECT RECOGNITION RATES 184 

LIST OP REFERENCES 166 

INITIAL DISTRIPUTICN LIST 189 



e 



LIST Of FIGURES 



1 . 
2 . 

3 . 
4 . 

5 . 

6 . 

7 . 



6. 
y . 

10 . 

n . 

12 . 
i ** 



14 . 

15 . 
16 . 
1? . 
18. 



19 



20 . 
21 . 



Speech Recognition Model 

Processing functions of a Speecn Recognition 

System 

T-6ee Speech Recognition Equipment 

Acoustic Scvna Reduction Chamter 

Placement of the SHURE SM-10 Microphone 

Recording Spirometer 

Use of Recording Spirometer to Measure and Record 

Vital Capaci ty 

The Wright PeaK flow Meter 

Measurement of Speaker's Rate of Air i low 

AaAI Tape Recorder 

Experimental resign 

Mean Error Rate vs. Jot Junction 

Mean Error Rate vs. Branch of Service 

Mean Error Rate vs. Computer Experience 

Mean Error Rate vs. Training Method 

Trials versus Jet function 

Trials versus Training Method 

Mean Error Rate versus Accent 

Mean Error Rate vs. Education 

Mean Error Rate vs. Vital Capacity 

Mean Error Rate vs. Rate of Air flow 



20 



63 

64 

65 

66 

69 

71 

71 

72 
77 
85 
£9 
93 
96 

100 

100 

iee 

112 

120 

120 



9 



22. Scatter Plot tor Vital Capacity 121 

£.'6. Scatter Piet fer Hate of Air Plow .,..121 

24. Mean Error Bate vs. State Anxiety (Week #1 ) 126 

25. Mean Zrrcr Bate vs. State Anxiety (Week #2) 1 26 

26. Mean Error Rate vs. Trait Anxiety 127 

2?. Mean Errcr Hate vs. Speaker Cocperat iveness 1.30 

26. Scatter Plot: Mean Error Hate vs. Question *4 124 

I'd . Scatter Plot: Mean Error Rate vs. Question u6 .....124 

20. Scatter Plot: Mean Error Rate vs. Question #£ 125 

21. Mean Errcr Bate vs. # Syiiaoles ( oy Week) .....129 

22. Mean Error Bate vs. # Syllables (Overall) 14Z 



10 



LIST 01 TABLES 



I. MILITARY APPLICATIONS FOE SPEECH RECOGNITION 35 

II. EIMENS IONS CE DIFFICULTY FCH SPEECH 



RECOGNITION 43 

III. SUBJECT CHARACTERISTICS 61 

IV. TEST i OR EQUALITY OE VARIANCES 83 



V. ANALYSIS 0? VARIANCE FOR RECOGNITION ACCURACY... 86 



VI. MAM TOTAL ERROR RATES FOR JOB FUNCTION 

BY WEEKS 67 

VII. AFFECT BY BRANCH OP SERVICE 88 

VIII. AiiECT BY JCB/SERVICE SATISFACTION 90 

IX. AFFECT OF COMPUTER EXPERIENCE 92 

X. Aii'ECT Oi COMPETENCY IN ANOTHER LANGUAGE 94 

A I . MEAN TOTAL ERROR RATES 103 METROS Ci TRAINING 

rY WEEKS ;. . 96 

III. AMICT OE TIME Ci LAY AML WEEK 98 



XIII. AiiECT LUE TO USER EXPERIENCE 99 

XIV. AEEECT LUE TO EASE OF USE C? VOICE EQUIPMENT 102 



XV. AFFECT OF RACE ON RECOGNITION ACCURAC V 104 

XVI. AFFECT OF MARITAL STATUS AML FAMILY SIZE 125 

XVII. AFFECT Ci RELIC-ICUS PREFERENCE 106 

XVIII. AFFECT GF ACCENT CM RECOGNITION ACCURACY 107 

XIX. AFFECT OF PLACE OF BIRTH AND GEOGRAPHIC ORIGIN . . 129 

XX. AiiECT OF LEVEL Oi EEUCATI ON Ill 

XXI. AiiECT OF SCO IO-ZCCNCMI C CLASS 113 



11 



XXII. AFFECT Of PAST AND/OR PRESENT DENTAL CAR F 114 

XXIII. Ai-rSCT CN RECOGNITION ACCURACY DUE TO AGE 116 

XXIV. AFIEcT OF HEIGHT AND WEIGHT CN RFC CG NIT I ON 

ACCURACY 11? 

XXV. AFFECT CF VITAL CAPACITY AND RATE Cl AIR f LOW... 119 

XXVI. AFFECT CN RECCGN IT ICN ACCURACY DUE TC 

PHYSICAL CONDITION 123 

XXVII. AFFECT CN RECOGNITION ACCURACY DUE TC ANXIETY ... 12S 



XXVIII. AFFECT CF SPEAKER COOPERATION AMD 

PARTICIPATION 13? 

XXIX. AFFECT CF RECOGNITION ERRORS 132 

XXX. AFFECT IUF TO ATTITUDES PERTAINING TC TFE 

USE CE VOICE 133 

XXXI. AFFECT DUE TO ATTITUDES TOWARD CONFUTED 

AND DATA PROCESSING 13? 



12 



ACKNOWLEDGEMENTS 



I wish to express rry thanks to my thesis advisor, 
Professor Gary Pcock for introducing me to the world of 
voice technology, allowing me the independence to conduct 
the experimentation as I desired, and for the competitive 
challenge posed on the racquetball court? to CDF Chuck 
Hutchins for his expertise and advice in Human Factors and 
for serving as second reader? to Jay Martin and Ellen 
Ho lane, for their practical advice? and to Paul Sparks for 
his technical assistance and advice. 

Finally, my siccerest thanks to my wife, Susan for her 



help, understanding 


and 


encouragement ? 


and tc ny sen , 


Michael, who has spent 


the 


better 


part 


of three months 


vendering where Dad 


was , 


for his s 


p e c i a 1 


smile and big hug 



when it was needed the most. 



12 



I. INTROEUCTICN 



The insistence and dependence upon state of the art 
equipment has been a predominant characteristic throughout 
the efforts within the Command and Control c ciuru n i ty . 
Eespite the penchant for never, better, and more 
sophisticated equipment, there must exist some measure of 
emphasis cr the personnel needed to train with, operate cn , 
and maintain the readiness of, such equipment. Personnel 
considerations cannot be divorced from test programs 
designed to identify optimal systems or equipment. When 
these considerations are carefully examined, then the data 
obtained from such programs can be effectively used to 
enhance personnel subsystem design and implementation. 

A personnel subsystem test program is one which places 
the requisite emphasis on personnel rather than equipment. 
Kryter [Ref. 1] enumerates six objectives necessary for a 
successful test program. 

1. To evaluate whether the system can be operated, 
maintained and controlled by the personnel assigned to 
i t . 

2. To determine the effect of humar performance on system 
performance and vice versa. This objective is ai^ed 
at discovering critical inadequacies in man-machine 



14 



interaction and subsequently identify changes that 
would itrprcve their compatibility. 

2. To develop valid qualitative ard quantitative 

personnel requirements, selection procedures, ard 
tables of organiza t ional manning. How many and what 
type of people will provide optimal effectiveness cf 
the man-nachine interface? 

4. To evaluate individual and/or long term operational 
readiness and applicable training programs. 

5. To evaluate training equipment and supporting 
ma teria 1 s . 

e. To evaluate job aids, technical publications ana other 
tools for training and for assisting cn the job 
performance . 

Increased productivity through automation involves two 
major issues? technological and human. Speech is a uniquely 
human capability. Speech recognition ty a computer involves 
getting a machine to accept, recognize, and correctly 
respond to spoken messages. This machine must take the 
input speech, compare it against the expected pronuncia ti or. 
for allowable utterances, identify the intended message or 
utterance, and produce the correct and appropriate response. 
To adeqvately implement the capabilities of such a 
technology, the objectives above become all the more 



15 



relevant. Cf paramount importance is the human, for it 
takes people to make all this automation work. 

Speech recognizers commercially available today are 
effective only within narrow limits. They have relatively 
small vocabularies and 'frequently' confuse wcrds. Within 
this context, it becomes incumbent upon the user to develop 
the skill to talk to the recognizer [Ref. 2: p. 26]. As 
such, a recognizer's performance will vary widely from 
speaker to speaker. 

(“uch of the work in speech recognition has centered on 
the development and improvement cf speech recognition 
devices. For example: 

— Linear Predictive Coding (I.PC) .in early '70s 

-- Eynamic programming 

— Development of 1 million tit/sec processors 

A user's experience nctw itfcs tanning , the human variable in 
recognition performance remains strong. This has often been 
observed in the past and even led to a description cf user 
categories [Ref . 2: p. 20] of 'sheeps' and 'goats'. These 
speech recognition systems work well for the 'sheep' cut the 
majority of the problems ere created by a small segment of 
the population - the 'goats'. 

Recognizing the significant impact that engineers have 
had on perpetuating the continued advent and technological 
advancement of speech recognition, it is nevertheless, 



ie 



critical to remind ourselves of the interdisciplinary nature 
of speech recognition. Besides engineering, the total 
discipline cf speech sciences and technology includes such 
traditional disciplines as psychology, linguistics, anatomy 
end physiology, computer sciences end human factors. This 
thesis endeavors to examine the impact of human factors on 
the successful recognition of speech, principally addressing 
the differences or variability among users. 

First, the modality cf voice input will te examined 
citing seme cf the more readily apparent advantages and 
disadvantages , and an overview provided as to its potential 
applicability in a Command and Control environment. With a 
general appreciation cf speech recognition (the term 'voice 
recognition' is synonemous and used interchangeably within 
this document) in hand, the variety of human factors that 
can affect the successful recognition of speech by a machine 
will then be summarized. Subsequently, the experimental 
methodology used to examine and differentiate speech 
recognition equipment users will he presented. Lastly, the 
experimental results will be presented and an analysis 
provided of the correlation of each variable examined to its 
associated error rates as well as ar analysis cf variance. 



1 ? 



II. COMPUTER RECOGNITION CE SPEECH 



A. OVERVIEW OF VOICE INPUT TECHNOLOGY 

Speech recognition can be considered as c subset of a 
broader field known as Speech Understanding. Speech 
Understanding Systems (SUS) have the objective of 
interpreting the intent of the speaker whether or not the 
user's speech is gramma t icaliy ccrrect or veil formed. 
While Speech Recognition Systems (SRS) are primarily 
interested in the correct recognition cf every word, SUS are 
concerned with the meaning of entire con versational 
segments . 

Until now the only significant undertaking has been the 
AREA SUR project [Ref. 3], a five year effort with the 
objective of obtaining a breakthrough in speech 
understanding capability that would then allow the 
development of practical man-machine ccmmunica t ion systems. 
Speci f ica l ly f the objectives were to develop a SUS that 
would accept continuous speech from many cooperative 
speakers of a general American public? a system which used 
syntactic analysis, semantics, pragmatic information and 
prosodies to acquire an appropriate computer response. 

The goals of speech recognition, in contrast, are less 
ambitious. Instead cf abstract concepts such as meaning cr 
understanding, SRS try to solve the mere practical problems 



16 



of analyzing the acoustic waveforr ana applying pattern 
recognition techniques in order to differentiate between 
utterances [Ref. 4]. Figure 1 illustrates a typical soeech 
recognition model. 

The acoustic speech signal is first analyzed to extract 
such acoustic parameters as frequency spectrum and the 
energy in different time segments. Next, information 
carrying features ere extracted that define various phonetic 
events such as how noisy (fricative-like ) the signal is, 
positions of different vowel-like sounds and vihraticr of 
the speaker's vocal cords. This information is then used to 
diviae the speech into time slices or segments and are 
labelled with phonetic categories. The phonetic sequence 
for the input speech is matched to stored sequences of 
expected pronunciations for the words in the lexicon or 
dictionary. and the best natching sequences are determined 
to be the most likely wcrd(s) that had occurred in speech. 

Speech recognition systems can be considered as 
belonging to one of two categories; continuous (connected) 
or isolated (discrete) speech systems. Continuous systems 
are those which can extract information from strings of 
word.s even though the words run together as in natural 
speech. Isolated systems require a short pause before and 
after utterances that are to be recognized as entities. The 
minimum duration of a pause is typically between 10C-20C 
msec. An isolated word recognizer is also limited in the 



Hypothesized Words 





WORD 


1 

! 

1 

I 


i i 

j LEXICON ! 




MATCHING 


1 

1 

1 

1 


1 1 
1 1 



Phonetic Sequence for input 



FEONETIC 

SEGMENTATION 

AND 

CLASS IE I CATION 



Information-carrying features 



PHONETIC FEATURE 
EXTRACTION 



Acoustic Parameters 



ACOUSTIC 

ANALYSIS 



Figure 1 



Speech Recognition Model 
From Reference 4) 



2U 



duration of the spoken utterance, usually 2-4 seconds. 
Continuous speech recognizers are just now beginning to 
appear cn the rrarket but are expensive and their 
capabilities and reliability have yet to be realistically or 
practically evaluated. lor the remainder of this thesis our 
discussion will be confined to discrete recognition systems. 

Two other concepts of speech recognition to be discussed 
are that of speaker independence and vocabulary size. 
Speaker dependent systems are those which require speaker 
adaptation (or 'training') in order to achieve recognition. 
This is in contrast to speaker independent systems which 
will recognize speech regardless of the speaker. In terms 
of speech recognition equipment and their associated 



vo ca buiaries , 


most recognizers work 


well 


with 


small 


voca ou leries 


of 10-50 words [Ref. 5: 


F • 


£0] 


The 


possibility 


of confusion between words 


inc rea ses 


a s the 



vocabulary size increases, and to some extent the chance of 
similar sounding words increases with such larger 
voca cuiari es . 

At this juncture it is appropriate to expand our 
definition of 'words' to encompass more than just individual 
words. As used herein, 'word' is used interchangeably with 
the term 'utterance' and may be either a singular mono- or 
polysyllabic word or a combination of mono- or polysyllable 
words joined into a phrase. (ie. Place-a-C i rc le-on-mo scow ) 



21 



The four processing functions [Fef . 6] contained in a 
limited vocabulary voice recognition system, as shown in 
Figure 2, consist of a transducer, preprocessor, feature 
extractor, ana a final decision-level classifier. 

1. Transducer: The microphone is the interface between 

the user ana the systerr and converts the spoiten phrase 
into electrical signals that are analyzed ty the other 
components of the system. 

2. Preprocessor: No matter how it is represented, 

spectral information must be explicitly or implicitly 
contained in all speech encodings. The initial 
analyses produce parametric representations [Fef. 17 ] 
and take place in the preprocessor. This segment of 
the system transforms the speech signal in order to 
enhance certain properties ana make them more easily 
detectable in a speech recognition system. The signal 
is normalized in time oy dynamic programming for 
subsequent comparisons with various reference 

patterns. Eata Compression removes any extraneous or 
irrelevant information. loth time and frequency 

domain analytical techniques are performed on the 
input signal. Speech analysis is achieved by either 
direct analog spectrum analysis via fast fourier 
transform (FFT) in the frequency domain, or linear 
predictive coding (IPC) in the time domain. 



I 

TRANSDUCER \ 



PREPROCESSOR 



FEATURE EXTRACTION 



CLASSIFIER 
(Decision Logic) 



Eigure 2. Processing Ennctions of a Speech 
Recognition Systeir (Frorr Reference 6) 



22 



Eeature Extraction: The key processing function in e 
pattern recognition system is the feature extractor. 
The more optimal the set of acoustical features 
extracted and sent to the classifier, the less complex 
the classifier need be to achieve a given accuracy 
level. This segment of the system produces a set 
number cf significant acoustical features (depending 
on the individual recognizer) a few cf which include 
spectral slopes, phonetic cl assif icat icn , and initial 
estimate of word boundary. 

Classifier: The classification process is performed 
in software using a minicomputer. When e speaker 
issues an utterance, the encoced features and their 
time of occurrence are stored in short term memory. 
The duration of the utterance is broken into time 
segments and the features reconstructed into the 
normalized time base. Reference patterns, previously 
input by the speaker for the system's vocabulary of 
words are compared to the feature occurrence patterns 
and a 'best-fit' or 'closest-match ' determined for a 
word decision. The number of bits of information for 
the feature map of each reference pattern is 
determined by mapping the number of acoustic features 
onto the cumber of time segments. 



The first two processing functions are accomplished by a 
hard wired preprocessor ana feature extractor. This 
achieves real-tirre processing since only the classification 
function is performed in a general-purpose minicomputer 
[Ref. 6: p. 177]. 

A discrete word recognizer must he 'trained' for 
individual talkers and/or words. This can be dene by a user 
simply speaking a set number of training samples into the 
device to provide a reference set of features. The system 
stores in memory the reference set cf word features for each 
word (utterance) the user has spoken. Cnee the system is 
trained, the user may speak werds into the device during 
normal operation and these are compared with the stored 
patterns. The 'closest fit' is selected as the recognized 
word. This sequence of events is commonly partitioned into 
the training and recognition modes of operation. 

There are two types of errors that can occur in speech 
recognition. The first is a rejection, or the inability of 
the recognizer to correctly classify an utterance. The 
second, and in a practical sense more troublesome, is a 
misrecognition . This occurs when the recogrizer classifies 
an utterance as something other than what was spoken. 
Better recognizers usually have recognition algorithms 
designed to reject rather than guess at questionable words. 
Higher quality systems such as Threshold (Models 600 and 
680) have error rates that are quite acceptable [Ref. 3, 9, 



25 



10J . Extensive experimen tation has shewn approximate error 
rates to te between .2 and 11.4 percent [Her. 6: pp, i?y- 
180] . Of course, what constitutes an acceptable error rate 
is critically dependent upon the particular application and 
data entry rate. 

B. THE VALUE C? SPEECH RECOGNITION 

The Department of Defense has teen very active in the 
past few years in their efforts to assess the merits of 
voice recognition with machines. Such locations as the 
Naval Postgraduate School, Wright PettersoE Air Force Rase, 
Rome Air Development Center, Naval Air Development Center 
and assorted ether agencies ana contractors, have cond octal 
extensive tests in order to examine human interaction with 
machines through the use of voice input and other 

modalities. In order to comprehend the need for fui ther 
research pertaining to voice input technology, It is 
essential to review the advantages and limitations that this 
type of technology offers. Mere importantly, it is 
essential to understand its potential capabilities and 
applications in a military environment. Is speech 
recognition beneficial (consiaering costs of 5200 
$80,000+), practical, and usable to justify the continued 
expenditures of research and development funds (6.1 and 6.4) 
and operational monies. 



26 



1. Advantages of Speech Recognition 



Proponents of computer recognition of speech will 
continually eitol the virtues and unlimited possibilities 
the technology offers. In an abbreviated fashion, the five 
general advantages of voice input to machines may be 
summariz ea as follows: 

-- Natural communication 
— Training 

— Multimodal communication 
— last communication 
— Error reduction in data input 

Speech is cur most natural mode cf ccmmunicatior . 
It is a familiar, spontaneous and convenient method of 
expressing one's thoughts, ideas, cr intentions. Untrained 
users of voice recognition systems, regardless of whether 
they can read, write, type or keypunch, car all speak or 
make sounds. These characteristics of the speech input 
modality make it applicacie for users at all general skill 
levels, from systems engineers to computer operators to blue 
collar workers on an assembly line. 

A user of speech recognition equipment requires 
little or no training. They have only to restrict their 
spoken utterances to those which the machine can recognize. 
In the case of discrete systems, isolated words are 
separated by a short pause so as to ease the location of 



27 



wora Boundaries and word choices to which the machine has 
been trained to recognize. Although this appears to be 
disadvantageous , it is more rea 1 i st i ca 1 ly a compromise to 
natural speech in that no adverse affects are caused the 
user in terms cf operating the speech recognition equipment. 

Experimentation [Ref. 11: p. e08] has shown that 
speech, instead of interrupting communications necessary tc 
perform other tasks, can enable users to do these tasks 
simultaneous ly with voice and thereby reduce or at a 
minimum, not add tc the time required to perform a complex 
task. The advantage of having one's hands and eyes free to 
do other tasks is perhaps the pivotal point ir the 
determination cf app .icabili ty of speech recognition 
devices. This multimode, aspect allows us to place the 
microphone anywhere (headset mounted, hard-heid, on a stand' 
and still communicate commands and information. Threshold 
Technology even has a wireless microphone [Ref. 12] that 
permits extensive mooility while talking to computers. 

The fastest modality for communications by a human 
is speech. An individual can speak twice as fast as the 
average typist can type [Ref. 5: p. 45]. This has been 
clearly demonstrated by Ochman and Chapanis [Ref. 11] whose 
experimental results showed that communication via 
typewriter or handwriting could not approach speech in terms 
of speed or task efficiency. Further substantiation from 
the Naval Postgraduate School [Ref. 8: p. 2] showed that 



28 



voice entry was 17% faster than typing, after only three 
hours of training. Additionaiiy , while speech recognition 
accuracy is slightly degraded ty mental or [rotor loading of 
the user [Ref . 13: p. 32], voice is nevertheless faster and 
more accurate than other input modes when the user must 
perform another task while simul taneously interacting with 
the speech recognition equipment [Pef. 8: p. 2] 

By now it is clear that speech recognition permits 
data entry directly into the computer without intermediate 
steps such as manual transcription or keypunching which are 
subject to error. Again, research at the Naval Postgraduate 
School lias shown that 183% more errors occurred in manual 
data manipulation (typing) than ty voice [Ref. 8 p. 2], 
Such common entry errors as the transposition of digits, 
which are usually caused ty eye movement or other 
distractions, are almost eliminated with the use of 
automatic speech recognition [Ref. 14]. 

2 . Limitations of Speech Recognition 

If a particular technology was devoid of errors or 
practical limitations, we could assume universal application 
and implementation. Although the advantages of speech 
recognition are seemingly well established, there do exist 
several problems associated with the ability to speak to 
machines. These limitations include: 

-- User variability 
— Constrained speech 



29 



— Isolated speech 
— Breath noise 
— User confusion 
— Env ironrrentai factors 

Speakers exhibit a vide range of personal 
characteristics that add a significant treasure of difficulty 
in the ability of a machine to recognize speech. A 
speaker's sex, geographic origin, and articulation 
experience are just a few of the elements that result in a 
user's variability. Consistency is also a key element in 
successful recognition accuracy. A speaker may talk quite 
differently in training the machine as compared to when he 
or she may use it in a practical application. Additionally, 
physical changes in the speaker such as age, physical 
condition, stress (physical or emotional), or fatigue, to 
name a few, can induce variability that will ultimately 
affect successful recognition accuracy. 

An isolated word recognition system imposes a 
restricted (constrained) vocabulary both in terms of size 
and content, upon the user. This becomes a limitation when 
we consider that most people are accustomed to speaking in 
natural, fluent prose. Because of the limited vocabulary, 
users must be careful of the types of words included for 
recognition. The similarity of sound structures between 
words (ie. Nine vs. Time) adds a measure of ccnfusicn that 
can subsequently affect overall performance. Design of 



30 



a vocabulary for a particular application is an Important 
and controllable factor in determining the accep ta bi li ty of 
voice input for a given task. 

Eecause isolated word recognizers depend 
significantly upon the detection of a minimum pause between 
words, word boundary detection becomes perhaps tbe single 
most critical limitation. The usual method is to measure 
changes in energy levels [Ref. h] . An isolated word is 
detected at a point where the energy in the acoustic signal 
rises above a certain threshold. At the end of the word, 
the energy drops, and the resultant silence indicates that 
the utterance is over. But, energy fluctuations are not 
enough to detect all word boundaries, and thus advanced 
detection techniques will have to involve detection end 
inclusion of stop consonants within words, while eliminating 
pauses due to 'lip-smacks' cr breath noise. 

In a United vocabulary, isolated word recognition 
system, breath noise can be a serious problem [Ref. 5 : r. 
174] . An individual wno is involved ir little or no 
physical movement while engaged with a voice recognition 
system can achieve very high recogrition accuracy. This 
accuracy can scon deteriorate once the user begins to move 
around. Inhaling will not cause ary adverse affects when 
using a close-talking, noise-cancelling microphone, but 
exhaling will produce signal levels comparable to speech 
levels. As physical activity increases so does one's 



31 



breathing pattern and as a result increased exhalation vill 
lean to the above mentioned deterio ration in recognition 
accuracy . 

While voice input provides multimodal 
communications, this particular advantage has an inherent 
limitation in that the user can become confused as to what 
mode to use. As a result, input modalities can became 
confused, and interfere with each other so that tie total 
rate of information transfer may net be as high as the sun 
of the rates possible with each separate modality. 

Finally, the environment in which the speech 
recognition device is placed may have an inadvertent affect 
on recognition accuracy. For example, speech recognition in 
an aircraft cockpit may be degraded due to engine noise or 
conflicting voice emanating via aircraft radio 
communications. Or, consider the placement of such 
technology in a crowded Military Command Center where its 
reliability can be affected by background noise: from other 
members located in the nearby work space. 

C. APPLICABILITY OF COMPUTES RECOGNITION OF SPEECH 
1 . Commercial Applications 

The first voice input systems to be used by industry 
were installed in late 1972 and early 1973 [Ref 15]. These 
early applications included: 

— quality control and inspection 



32 



-- automated material handling 

— direct voice input to compu ters 

Their successful implementation was due in larse part to 
recognition accuracies that were greater than or equal to 
the manual keying accuracies obtained from the same 
personnel . 

In post quality control and inspection processes the 
inspector's hands and/or eyes are occupied in the inspection 
task. Through the use of a voice recognition system it is 
possible to combine the inspector's normal work requirements 
with the simultaneous entry of all data measured and 
observed. Owens-Illinois Corporation installed voice data 
entry equipment in early 1973 for the inspection cf color 
television faceplates. Here was an application where the 
inspector "had to manipulate, orient, and measure parameters 
using gauges and meters". The requirement to simultaneously 
record the measurement data also existed. In this example 
the operator was able to achieve both tasks at once [Ref. 6: 
pp. 182-183]. 

Voice entry has been utilized in recent years to 
control the movement of materials such as parcels, 
containers, baggage etc. through distribution and sorting 
centers. A voice controlled package routing system 
installed by SS Kresge in November 1974 allowed just one 
operator to, handle each item, read the label, and speak the 
destination code for each carton into his/her microphone. 



33 



formerly this had been an operation that required two 
persons and still resulted in the 'bunching' up of different 
size packages. Following the installation of voice 
activated sorting equipment, the hunching problem was 
eliminated, productivity increased, and sorting errors 
reduced [Ref. €: p. 1S5] 

2 . Military Applications 

These applications may be placed in the general 
categories of, equipment and process control, field data 
entry, data management, and cooperative man- machine tasks. 
A more defiritive classification was proposed by Reek et. 
al . in 1*7? [Fef. 16] tc include the general areas of 
Security, Command and Control, Data Transmission and 
Communication and Processing Distorted Speech. Table I 
provides a recapitulation of military tasks that could be 
considered for speech recognition technology. 

Of particular interest is the use of speech 
recognition for Command and Control applications. The term 
C3, Command, Control, and Communications, refers tc an 
overall system comprised as a minimum of these key elements. 

a. Command Authority: The commander provides the central 

authority, unity of purpose, and the overall concept 
as to how operations will be conducted to accomplish 
mission objectives. 



34 



TABLE I 



MILITARY APPLICATIONS IOB SPEECH RECOGNITION 
(From Reference 16) 



I. SECURITY 

A. Speaker Verification (authentication) 

B. Speaker Identification (recognition) 

C. Ce term i nat ion of emotional effects (ie. stress) 

D. Recognition of spoken codes 

E. Secure access voice identification 

I. Surveillance of communication channels 

II. COMMAND AND CONTROL 

A. System control (ships, aircraft, situation 

displays, etc.) 

B. Voice operated compu ter input /output 

C. Data handling ard record control 

D. Material handling (mail, baggage, publications) 

E. Remote control (hazardous materials) 

F. Administrative record control 

III. DATA TRANSMISSION AND COMMUNICATION 

A. Speech synthesis 

B. Vocoder systems 

C. Bandvidth reduction 

D. Ciphering/coding/scrambling 

IV. PROCESSING DISTORTED SPEECH 

A. Diver speech 
3. Astronaut communication 

C. Underwater telephone 

D. Oxygen mask speech 

E. High 'G' force speech 



3fc 



d. Organization: This element provides the pathways 

through which the plans, priorities, and directives of 
the commander are provided to the force and through 
which information pertaining to the forces can be 
provided the central authority. These pathways are 
found at each echelon in the fcrrr cf command pests, 
operations centers, or command centers. 

c. Communications: This provides the means for 

transmitting plans, priorities, and orders to elements 
of the force and the means by which the forces ray 
inform the Commander cf their activities and needs. 

d. Information: A key element that facilitates control 

by confronting the Commander with only that 

information required to support the decision-making 
process. Information supports both the staff 

planning and command decision-making process at all 
levels . 

The command centers that will provide the requisite 
organizational framework, perform several vital functions 
for the Commander. First, is the capability to communicate 
securely, and preferably ty voice over a wide choice of 
circuits. Secondly, each command center has the task cf 
integrating information which comes from its supporting 
elements. A third capability provided by these centers is 
the processing and display of information. The fourth 

function, associated with number tbree, is the quick and 



36 



accurate dissemination of information, reports, and 
directives for the Commander. 

We are particularly interested in the function of 
information processing and dissemination as it pro»Mes e 
suitable application for computer recognition of speech. 
Command center automation, resulting in more efficient 
communications, will lead to increased productivity. In its 
broadest sense, communication is the management of 
information, and information, not paper, is the chief 
product of the command center. Cur C3 systems that are 
designed* and fielded for these centers, and speech 
recognition as * a component of such, can provide cur 
Commanders the capability to "o bser-ve", "decide'', "act", and 
"react" with speed, decisiveness ana accuracy. 

Navy feasibility studies sponsored by Naval 
Electronics Command and conducted by Dr G.X. Pccck of the 
Naval Postgraduate School, examined the potential tor voice 
data entry for Command, Control, and Communications. Two 
voice recognition systems were installed in 1980 at Fleet 
Beadquarters , Commander-in-Chief Pacific (ClflCP/CFLT) in 
Hawaii to examine the benefits and limitations of voice 
input for operation of the Worldwide Military Command and 
Control Time-Sharing System (WWMCCS TSS) and the Ccean 
Surveillance Intelligence System (CSIS) [Ref. 17: p. 24]. 



37 



Poock has also demonstrated that using voice inrut 
to exercise a typical scenario on the ARPANIT, an 
experimental network since 1969 employing packet switching 
technology and connecting over 150 host computers, was 
signif ican tly faster and mere accurate than entering the 
commands manually [Ref S] . Twenty-four subjects followed a 
fixed scenario of instructions where they accessed the 
ARPANET, logged into different host computers, read 
messages, sent messages, read file's, transferred files 
between host computers, deleted files and interconnected 
host computers. Simulated command centers operating on this 
network include the Naval Postgraduate School (Monterey, 
California), Naval Ocean Systems Center (San Diego, 
California) and CINCPACFLT (Hawaii). 

Automatic speech recognition has also teen found tc 
have considerable potential for imagery interpretation and 
intelligence report generation [Ref. 17: p. 49]. A 
significant amount of research has been performed for the 
Defense Mapping Agency (DMA) for such applications as voice 
data entry for the processing of Digital landmass System 
(DLMS) data, preparation of Plight Information Publication 
(FLIP) data end ocean-depth measurements for digitized 
cartographic applications. In ail these applications the 
environment is such that the operator's hands are busy and 
frequently involve the use of stereo optics and other 
special devices. Voice has been shewn experimentally to be 



36 



faster, easier, ana a less fatiquing rode of data entry than 
historically mere conventional means iBef. 17: p. 37] . More 
recently, the feasibility and advantages of "voice input 
technology were described for use in the COINS Network 
Control Center (CNCC), The Community On Line Intelligence 
System interconnects on-line information storage and 
retrieval systems located at a number of locations within 
the United States intelligence community [Bef. IS]. 



III. HUMAN FACTORS IN SPEECH RECOGNITION 



A. DEFINITION AND PURPOSE 

Human factors is concerned with improving the 
productivity of the user by taking into account human 
characteristics in the design of a system. As described by 
Huchingson IRef. 19: p. 4], 

The term "human factors" is more comprehensive, covering 
all biomedical and psychosocial considerations applying 
to man in the system. It includes not only human 

engineering, but also life support, personnel selection 
ana training, training equipment, job performance aids, 
and performance measurement and evaluation. 

The people referred to in this definition ere those vho 

typically operate, maintain or service the system. They are 

those who will interact with the system's design. When the 

focus is on a broader interpretation it's appropriate to 

speak of a Human Factors Subsystem or Personnel Subsystem as 

was described earlier. 

Human factors engineering deals principally with the 
many factors involved in the design of a new system - from 
hardware co personnel. For our efforts in this analysis, 
the current technology has been determined to be acceptable 
and, experimentally as well as operationally reliable for 
its use in a Command and Control environment. New, user 
variability is to be investigated further in terms of how it 
affects recognition accuracy. 



40 



Since energy in a speech signal is usually displayed in 
terms of frequency, intensity and time, it would seem 
plausible that each wcrd should have a unique acoustic wave 
pattern and, if so, word recognition would he a simple 
matter of the voice recognition system scanning the pattern, 
comparing the simple pattern with a data hank of reference 
word patterns, and deciding which word was spoken. 
Unfortunately, human variability messes up this uniquely 
simplistic approach. Our purpose then is to discuss the 
human as a component in a complei system designed by humans 
and to note the fundamental advantages and limitations of 
the human in relation tc an automated voice recognition 
system. 

B. FACTORS AFFECTING RECOGNITION ACCURACY 
1 . < Ge > ne > ra^ 

Limitation of vocahuiaries to 1 21 words have 
resulted in identification accuracies of between 98? - 99% 
in a controlled laboratory environment. In an operational 
or field setting recognition accuracies have been reported 
as low as 50% (Ref. 20: p. 626]. Various factors noted for 
interfering with successful identification have included 
background noise, inconsistent microphone placement. 
Insufficient training, inconsistent speaking style, and the 
lack of user cooperation. Lea in a paper titled "What 
Causes Speech Recognizers to Make Mistakes?” [Ref. 21] calls 



41 



for the determinat ion of those factors that influence 
recognition accuracy rather than the repeated assessment of 
transitory devices. Table 2 : summarizes the four 'dimensions 
of difficulty' Dr Lea has proposed. What needs to be 
accomplished is the characterization of the relative effects 
of changes along each of these four dimensions, or more 
simply stated, find the factors influencing the accuracy of 
machines that recognize speech. 

Because there are so many variables involved that 
affect recognition accuracy, the list in Table 2 may be 
reorganized iD a "communication-theoretic” framework. This 
framework models the speech recognition error rate as a 
function of seven complex sets of factors [Ref. £: pp. €y~ 

93] that include: 

— Task Factors 
— Human Factors 
— Language Factors 

— Channel and Environmental Factors 
— Algorithmic Factors 
— Performance Factors 
— Response Factors 

It is the set of Human Factors that this experiment 
and analysis is principally concerned with, for it is this 
stage of the model that has a major impact on speaker 



42 



T A£L E 1 1 

DIMENSIONS OF DIFFICULTY FOR SPEECH RECOGNITION 

(From Reference 5) 









1 . 


Form of speech to be recognized 






TASK AND 




2. 


Accuracy requirements 






PERFORMANCE 




3. 


Required throughput rates 






REQUIREMENTS 


- 


4 . 


Type of device necessary 




! = 


II 

II 

II 

11 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 




= = = = = 


II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

II 

1! 

11 

II 

II 

11 

II 

II 

jj 

it 

II 

II 


= = j 






* 


1 . 


Sex 










2. 


Dialect 










3. 


Vocal tract size 










4. 


Vocal cord characteristics 










5. 


Pronunciation habits of speaker 






HUMAN 




6. 


Physical state 






VARIABILITY 




7. 


Psychological state 










e. 


Workload 










9. 


Coopera ti veness 










10. 


Time or day /week 










11 . 


Time since training 


! 








12 . 


Number of training samples/vord 










13. 


Rate of talking 








* 


1 . 


Size of active sutvoca Diuary 










2 . 


Word length 










3. 


Word sound structure 










4 . 


Conf usati 1 ity 






LANGUAGE 




e 

w • 


Language spoken 






DIFFICULTIES 




6. 


Syntactic, semantic, and 












pragmatic constraints 


! 








7. 


Enhancea oi 1 i ty 










8. 


Stress Pattern 










9. 


Intonational variability 










10 . 


Rhythm and timing variability 








- 


1. 


Noise level 










2 . 


Type(s) of noise 










3. 


Bandwidth 










4 . 


Spectral distortions 






ACOUSTIC 






Transducer characteristics 






DIFFICULTIES 




6. 


Placement of the transducer 










7 . 


Ampl i tude 










8. 


Vibration 










9. 


Acceleration 





43 



variability. This set of human factors can be further 
subdivided [Ref. 21: p. 2] in order to monitor their 
influence on recognition error rates. A few of these are 
listed below: 

— Speaker Experience 
— Training Method 
— Sex of the Speaker 
— Physical Dimensions of the Speaker 
— Geographic Origin of the Speaker 
-- Speaker Dialect 
— Physical State of the Speaker 
— Psychological State of the Speaker 
— Speaker Cocperativeness 
— Time of Day or Week 

Eecause different speakers may demonstrate widely 
varying methods of pronouncing words or phrases, the above 
listed factors may be further separated into two categories? 
those occurring between speakers and these affecting each 
individual speaker. First, some of the differences between 
speakers that induce variability will be briefly examined 
and then the variabilities apparent within each speaker that 
can affect recognition accuracy will be discussed. 

2 . Differences Between Speakers 

Speaker Experience: This factor can take on a two- 

fold meaning when looking at it as a source of variability. 



44 



First is the experience of 



using voice recognition 
recognition users should be 



equipment. Experienced voice 
expected to have a higher and more reliable recognition 
accuracy than those who are 'naive' to the technology. 
These experienced users are comfortable using the equipment, 
less lively to be intimidated by the system, and are 
familiar with its performance capabilities from previous 
usage. The other meaning of speaker experience has tc do 
with job skill. Can a user who operates in a microphone 
environment on a laily or regular basis, such as an Air 
Traffic Controller or a Pilot, be expected to have better 
recognition rates than those who have never spoken into a 

microphone? A data processor who works regularly in an 

environment demanding precise data entry by keyboard might 
have the type of experience or skill factor that would 
provide an edge over a prospective user possessing only 
basic typing skills. This type of experience overlaps 
slightly with speaker cooperativeness and will be elaborated 
upon later. 

Method of Training: The ideal form of voice 

interaction would be for a user to pick up the microphone, 

speak commands the machine can understand, and for the 
appropriate response to take place. Naturally, this is the 
goal of speaker independent systems, but since humans all 
speak differently and our form of speech recognizer is 
discrete, we are mandated to provide the machine some 



4 £ 



information about how we speak each word intended for our 
desired vocabulary (ie. Training). The method by which the 
machine is trained by the user will in large part dictate 
subsequent recognition accuracy. If the user is closely 
supervised and made to carefully speak the particular 
vocabulary then we should be able to expect higher 
recognition rates as opposed to the user who is given 
cursory instructions on the use of the equipment and allowed 
to go on independent of further supervision during the 
training mode. An adjunct of training method is the number 
of training 'samples' cr pronunciation pattern. It is 
difficult to achieve accurate speech recognition when the 
number of training passes per word is small or smaller than 
manufacturer specifications [Ref. 22]. Using identical 
equipment, it would still be reasonable to anticipate some 
speakers, having had a lesser amount of training samples per 
word, having more success than others who have had mere 
samples per word. 

Sex: Pale voices have lower frequencies than 
females and a more detailed spectral structure results from 
the lower pitch of their voices. This detailed structure is 
more indicative of the vocal mechanism and of the intended 
vowels and consonants spoken. Pale voices tend to fare 
better with recognizers employing frequency domain analysis 
while female voices tend to have greater success with 
machines using time domain analysis [Ref. 5]. A recent 



46 



comparison was conducted [Ref. 22 ] which revealed no 
statistically significant difference between the sexes. 
Although not a primary objective of the thesis, it re^ai-s a 
source of variability that merits some measure of analysis. 

Speaker Dialect: Dialects not only affect the 
specific sound produced for each vowel or consonant type, 
but also exhibit different dynamics of speech production. 
For example, Southerners have their readily identifiable 
drawl, whereas a New Yorker will tend to say "Toid" rather 
than "Third" and residents of Cambridge, Massachusetts can 
be heard to talk about "Habvahd" instead of "Harvard". 

Physical Dimensions: Throughout the literature on 
speech recognition one will see speaker variability 
attributed to a variety of factors, none of which include 
the physical dimensions of the speaker. An examination of 
the recognition accuracy for a selected sample population 
based on physical dimensions would provide an interesting- 
insight iDto the ramifications of such a factor as a 
component within a personnel selection subsystem. In other 
words, what effect, if any will height and weight have on 
recognition accuracy? 

Geographic Origin: This particular factor is 
multidimensional consisting of several sub-factors which 
require careful examination: 

— Place of birth 

— Geographic area of upbringing 



4 ? 



— Ethnic background 
— Religious preference 

The above may impose ideo syncra tic or social differences in 
habits which can produce variations in sound and 
subsequently in pronunciation. These suD-4aetors all 
contribute a reasure of variety that can presumably affect 
recognition accuracy. 

3 . Differences Within Speakers 

Physical State: The present physical state of a 
user of voice recognition equipment can precipitate 
variability in his or her voice. For example, a cold, seme 
form of pathological condition, fatigue etc. can alter the 
speaker's voice. The individual's voice quality couia be 
different based on physical conditioning. Is The user who 
works cut regularly and stays in excellent physical 
condition more likely to show higher recognition rates than 
one who rarely eiercises, smokes regularly and generally is 
not in the best cf health? 

Psychological State: Spielberger [Ref. 23: p, 29j 
defines transitory or state anxiety as a complex, unique 
emotional condition that can vary in intensity and fluctuate 
over time. State anxiety may be thought of as consisting cf 
unpleasant, consciously perceived feelings of tension and 
apprehension with an accompanying activation or arousal of 
the autonomic nervous system. The concept of trait anxiety 
refers to the relatively stable individual differences in 



4 £ 



anxiety proneness. It may also be a reflection on the 
frequency and intensity with which state aniiety has been 
previously manifested ana the probability that such anxiety 
will occur in the future (.Ref. 23: p. 39] . The fact that 
physiological functioning is affected during periods of 
anxiety is easily apparent. The degree to which speakers 
deal with a state or trait anxiety rr ay well be a significant 
variable of consideration in the exam l nation of error rates 
of voice recognition systems. 

Speaker Cooperat iveness : Rev enthusiastic and/or 
willing a speaker is toward the use of voice recognition 
equipment could induce speaker variability and hence 
subsequent recognition accuracy. In a military environment 
where many job positions are cf a non-voluntary variety, it 
is conceivable to expect the selection of voice recognition 
users who are told to operate the equipment regardless of 
their personal preferences. If the user distrusts the 
technology or prefers manual entry, and, is still required 
to use voice, we have developed a non-cooperative user. A 
non-cooperative user is therefore, cne who is consciously 
trying to undermine the successful operation cf the machine. 
The cooperative user is one who is willing to help the 
machine by saying precisely what the machine wants and 
pronouncing it in a clear and consistent manner. There is a 
certain grey area surrounding this factor with the presence 
of users who, although not consciously trying to confuse the 



49 



device, are not fully committed to "helping the rrachine" to 
recognize the correct utterances. 

Time of Bay/Week: Each person's speech is variable 

depending upon time of day, changing from morning tc evening 
and even changing progressively over a period of time [Ref. 
5] . An examination of recognition performance over extended 
periods of time [Ref . 24: p. lj shoved a statistically 

stable performance over time (21 veeks ) with no serious 
degradation occurring as time elapsed. Nevertheless a user 
who has a gap in time betveen training and operational use 
may forget any special ways he/she trained the machine. Rev 
much of a gap is tolerable is a subject for future research. 

4 . Miscellaneous factors 

Some additional human factors that have teen 
proposed [Ref. 5] deserve a brief description. They have 
been relegated to a separate section because, for one reason 
cr another, lack of equipment, current technical skills, 
lack of measurable quantitative data etc. experimental 
examination at the present time has beer, precluded. These 
factors include: 

— form of speech 
— Speaker dependence 
— Rate of speech 
— Vocal tract size 
— Speaker's glottal spectrum 



voice 



lorn: of speech refers to the type of 

recognition systerr to De used, isolated or continuous. 
Continuous systenrs, being a quantum step above isolated in 
terms of complexity, bring about a greater opportunity for 
speaker variability to manifest itself. Such things as 
detection of word boundaries, slurring of speech (ie. "dija” 
vs did you ), and prosodic characteristics could seriously 
affect recognition accuracy because of these types of 
complications which a continuous speech recognition system 
int roduces . 

A speaker independent system negates the requirement 
for training and thus variability between speakers becomes a 
more critical factor for independent systems to contend 
with. Independent recognizer performance will have to be 
tailored to accommodate an unlimited number of potential 
speakers ano their associated variability. 

Ibe faster a person speaks the more likely that the 
expected pronunciation will be altered due to slurring, 
deleted syllables, etc.. If a machine is trained tc one 
form of pronunciation ana at one particular rate of speech, 
a differing rate in an application mode, will cause an 
increase in recognition difficulty. With an isolated word 
recognizer tc be used in the experimentation, requiring a 
minimum of 100 msec pause between utterances, and utterances 
not exceeding 2.0 seconds in duration, this particular 
factor was not considered essential to the overall analysis. 



51 



It is rather, an important factor in terrrs of continuous 
recognition systems. 

The size of the vocal tract will produce changes in 
the for rants of the speech signal? the sraller the vocal 
tract the higher the ferments. This can have an impact cn, 
for example, transmission through limited bandwidth 
channels. Vccai cord characteristics alsc produce 
interspeaker variability such as pitch or "resonant" quality 
of the voice. Speakers with more "resonant" voices that 
project veil, will be easier for recognizers to handle [Ref. 
5: p. 78 j . 



£2 



IV. INSCRIPTION Of TEE EXPERIMENT 



A. OBJECTIVES AND CONSTRAINTS 
1 . Objectives 

As noted earlier, otr overall objective was to 
examine the human as a component in a complex system. In 
narrower terms, this experimentation attempts to assess the 
affect of differing occupational, operational, personal, 
physiological, and psycho logica 1 characteristics of a user, 
on the accuracy with which a currently available voice 
recognition system will correctly interpret spoken 
utterances. Subsequently, our discussion will address the 
occurrence, if any, of existing quantitative parameters that 
would enable us to differentiate between effective and non- 
effective users of voice recognition systems. 

The following specific characteristics are examined 
in this thesis. Many of the individual characteristics, or 
human factors, are self-explanatory while others are 
provided with a brief explanation and/or rati on ale for 
selection . 

a. Occupational Characteristics 

This set of parameters examines the possible 
effect on recognition accuracy due to differences inherent 
in a user's occupational skill or job (military or civilian) 
background. Specific characteri sti cs include: 



c;t: 



— Job function: Comparison of recognition rates 

Detween microphone experienced users (ie. piicts, 
air traffic controllers) and non-experienced users. 

Branch of service: A factor with possible 

consequences pertaining to its use in personnel 
selection criteria. 

— Job satisfaction: A subjective evaluation by the 

user as to his/her job satisfaction in their current 
duty assignment end their satisfaction within the 
Armed Services. 

Previous computer experience: Computer experienced 

personnel (ie. Eata Processors) are expected to 
have a better appreciation for the advantages of 
voice input and thus, be mere conscious of their 
efforts and positively motivated for higher 

recognition accuracy. 

Foreign language competency: Frequently military 

and civilian members associated with ECU are 
required to possess the capability to fluently speak 
a foreign language. This ability is another factor 
that could affect one's speech. 

b. Operational Characteristics 

This set of parameters examines the possible 
effect on recognition accuracy due to factors surrounding 
the operational use of voice recognition equipment. 

Specific character is t ics include: 



54 



Training method: Analysis of recognition rates for 



those users who are supervised during the training 
rrcde compared to those who are allowed to train the 
equipment individually. 

Time of day and week: A determination of whether 
the time frame in which a speaker trains the 
recognizer will have ary subsequent affect on 
recognition accuracy. 

Equipment experience: Comparison of recognition 
rates between experience! users of voice recognition 
equipment and those who have never used the 
equipment before ('naive' users). 

Ease of use: The operational simplicity of the 
equipment could affect a speaker's performance. For 
example, a speaker who considers the recognizer as a 
complei and operationally difficult device will be 
less likely to devote his or her maximum effort tc 
their performance. 



c. Personal Characteristics 

The following are various characteristics 
considered to have a possible effect on an individual's 
speech patterns, and hence, affect the recognition accuracy 
of a voice system. These parameters include: 



— Race 

— Marital status and family size: A correlate of 



psychological state and, although equally likely to 
be included as a psychological characteristic, it is 
considered here as a criterion for personnel 
selection, iarcily size refers to the number of 
offspring the user has as opposed to the size family 
in which one was raised. 

Religious pref erence/Ethni c background 
Accent or dialect 

— Place of birth/geograpni c origin 

— Level of education 

— .Socioeconomic class: similar in nature to the 

characteristic of marital status but is considered 
for its merit in selection cf personnel than for its 
aifect on individual speech patterns. 

Cental or orthodontal care: Braces, corrections for 

improper bite, or major oral surgery, are considered 
for their implication on the speech patterns of 
those individuals and the resultant error rate. 

d. Phys icicgicai Characteristics 

These characteristics are also considered to 
have an affect on speech and as a result are factors of 
interest when examining recognition accuracy and speaker 
variability. Tnese parameters include: 

— Heignt 

— Weight 



56 



— Age 

Fhysicai condition: A subjective evaluaticr by the 

user of his/her current physical condition. 

Hate of airflow: measurement of ventilatory 

function to provide a diagnosis of condition 
affecting voice. This measurement can also te used 
as an indication cf possible airway obstruction. 
Vital capacity: The maximum amount of volume of air 

which can be exhaled following maximum inhalation. 
This treasure provides an estimate of the amount of 
air potentially available for the production cf 
phonation. 

Speech training: Examines whether formal speech or 

voice training affects recognition accuracy. 

e. Psychological Characteristics 

The current psychological state cf a user, their 
co cp era t 1 v en es s , ana their personal attitudes toward 
automation and voice all contribute toward the overall 
affect on recognition accuracy. The particular parameters 
investigated include : 

— Psychological anxiety 
Speaker cooperativeness 

— Affect of errors on subsequent performance 
— Attitudes toward voice reccgrition equipment as a 



17 



time saving job aia 



Attitudes towards computers and data automation. 



In effect, items 4-6, are related to speaker coopereti ver.ess 
in tnat now a user feels about computers ana voice 
recognition could impact on their willingness to reliably 
support the use of voice recognition equipment. 

2 . Constraints 

Accomplishment of test objectives were constrained 
within the research facilities of the Naval Postgraduate 
School. In the interest of time, experimentation was 
limited to five weeks . 

Because voice production is an extremely complex 
event in whicn auditory, acoustic, and aerodynamic events 
are produced oy the interaction cf physiological mechanisms, 
it would ce beneficial if we could measure as many vocal 
parameters as possible in order to achieve a complete and 
accurate picture of voice production, its associated 
variability among speakers, and its correlate to voice 
recognition accuracy. Lack cf equipment, time, and/or 
expertise precluded examination of such factors as: 

— Glottal waveform 

— Transfer function of the vocal tract 
— Sound-pressure level 

— Maximum duration of sustained phonation 
— Maximum frequency levels 
— hcdal frequency level 



£8 



SUBJECTS 



D . 

Forty-four subjects participated in the experiment on a 
volunteer basis. The group was composed of Z't military 
officers, 1? military enlisted, and ^ civilians. The 
military officers representing the Army, Air Force and Navy 
consisted of kl rtales and 4 females while the enlisted 
personnel representing the Army and Navy consisted of 11 
males ana € females. The civilians included a professor from 
the NFS Oceanography Department and an employee of the 
Defense Manpower Date Center (LMEC) in Monterey. The rank 
cr grace of the military subjects ranged from G-Z to 0-4 for 
the commissioned officers, L'dZ to CW3 for the Warrant 
Officers, and E6 tc Z7 for the enlisted personnel. The 
subjects ages ranged from ZZ to 47, with an average age of 
60 . 

It was desired that the speakers selected for the test 
ce representative of tne population for which the recognizer 
is to ce used, in cur case a Command and Control environment 
and m particular, a military command center. Subjects 
taking part in the experiment were representative cf this 
environment as shewn by the grace distribution ana types of 
military occupational specialties, although some of these 
specialties are not readily apparent in current job 
description (ie. Nedical NCO). 

Twenty-five cf the subjects were frem Fcrt Crd and 
included a variety of backgrounds such as pilots, air 



59 



traffic controllers, signal officers, signal non- 
commissioned officers (NCO's), and infantry platoon 
sergeants. Five of the subjects were data processors; 2 
from the fleet Numerical Oceanographic Center in Monterey 
ana 3 frorr aaminis t rat i ve offices of the Naval School. 
Twelve subjects were students at NFS arid enrolled in the 
Command, Control, and Comnunieatiors (C3) curricula. A wiae 
diversity in their backgrounds is illustrated by previous 
job categories such as aviation, communica t ions , systerrs 
programming, conrunicat ions maintenance, conmand and staff, 
and nuclear engineering. 

Twelve of the subjects had experience using voice 
recognition equipment, having participated in previous voice 
experimentation iRef. 9]. A summary of subject 
characteristics is provided in Table III. 



C. ICUIPMINT 

1 . Voice Recognition System 

A Threshold Technology Inc., Model T-600 voice 
recognition systen was used to represent a commercially 
available, state-of-the art recognizer; one which has been 
veil documented as to its reliable recognition accuracy. 
The T-c00 is a speaker dependent, isolated word, speecn 
recognition device wnicn automatically recognizes spoken 
words ana phrases. These words and phrases (utterances) may 
be as brief as 2.1 second out will usually range from 0.25 



TABLE III 



SUBJECT CHARACTERISTICS 



SEX 


j 

1 

1 

1 

i 


SERVICE 


LOCATION 




VOICE 


Pale: 


24 ! 

i 

i 


Army: 2? 


El 0 rd : 


2b 


Experienced 
Users: 12 


} emale : 


10 i 

i 

i 


Navy: t 


NPS : 


16 


Naive 




i 

1 


Air 


ENCC : 


2 


Users: 22 




1 

1 

! 

i 

i 

i 


lorce: 7 


EMDC: 


1 





RANK 


i OCCUPATIONAL BACKGROUNDS 

i 




i 

o 


e 


1 

1 

I 

Pilots. : 2 Air Traffic 

i 


Controllers: 


5 


C-3: 


y 


! 

Bata Processors: 5 Supply Officer: 

\ 


2 


0-2: 


c= 


1 

! Medical Officer: 1 

i 


Medical NCO: 


1 


C V. 3 : 


c 


i 

| Signal Officer: 2 
» 


Signal NCO: 


n 


C'h'd: 


t- 


1 

i Finance Officer: 1 
] 


Engineer NCO: 


i 


5-7: 




j 

! Opera'; ions Officer: 1 

s 


Professor: 


i 


1-5 : 


4 


1 

i Computer Systems Manager: 

1 


1 




Z-b : 


7 


1 

! Graduate Students: 12 (vnica include) 

1 




5-2 : 


1 


i 

j Pilots: 2 

| Communications Officer: 


2 




CIV: 


2 


| Communications Maintenance Officer: 


2 






| Systems Programmer: 1 

i WKMCCS Programmer: 1 










! Submarine Nuclear Engineer: 1 








j Infantry Unit Commander 

AUTOBIN Supervisor: 1 
i 

I 


: 1 





61 



to 1.0 seconds end must be separated by very short pauses of 
.1 second or more. The terminal allows a user to begin an 
utterance "before it has completed processing the previous 
one, tut in this experimental on rate of speech was 
controlled by use of the RIADY indicator light located on 
the tape cartridge unit. This light indicates wnen the 
terminal is ready to accept the next utterance in both the 
training and recognition modes LRef. 25} . 

Tne Threshold £00 in its standard configuration is 
composed of the following four elements: 

Terminal consisting of: 

analog speech preprocessor 
LSI-11 microcomputer 
digital RS-232 input/ouput interface 
Standard CRT/'Xeyboara Display Terminal 
Remote Voice Input Unit ^Microphone preamplifier) 
Tape Cartridge Unit 

The terminal, CRT display, microphone preamplifier, and tape 
cartridge unit were table mounted (Figure 3) within an 
acoustic sound reduction booth (Figure 4). A conventional 
SHURF monel SM-10 "boom’ microphone, supplied as standard 
equipment with the T-600 was used. The microphone possesses 
a special ncise cancelling design which allows the T-€00 to 
perform, accurately despite most extraneous background noises 
(Figure t ) . 







figure li . T'-c00 Speech Recognition Equipment 








Figure 4. Acoustic Sound. Reduction 



Cham oer 



£4 





65 




I b e speech preprocessor accepts the speech signal 
input from the microphone preamplifier and passes it through 
a spectral analyzer for word boundary detection. The 
feature extractor monitors for 22 phoneticaily-relevant 
features, ana converts these to aigital signals. Words are 
detected from occurrences of low energy. A minimum pause of 
0.1 secor.a must occur to prevent confusion between words. 
Any treathing noise at the ena of the word is removed. The 
remaining speech is divined into 16 fixed time segments, ar.a 
features are reconstructed onto the normalized 16 segment 
tine base. 

The microcomputer does a comparison of input signals 
against storea reference patterns. Each word is represented 
by 512 (16 x 32) bits of information. The closest fit 
between an incoming template and the alternative stored 
training template is found, and that 'closest' word is 
declared the word identity, unless the sccre is so low that 
no aecision can be made ana the utterance is rejected 
outright. The vocabulary reference patterns are 
established by me suDiect 'training' the recognizer. This 
is accomplished cy the suDject making a set number of 
repetitions of the various vocabulary utterances. 

Once a match is found, the appropriate character(s) 
are sent via the output interface to the CRT to indicate tc 
the user which utterance was recognized. These terminal 
matches are further categorized as misreccgni t ions , where 



66 



she terminal's 'closest' match to the reference »oceculcry 
was not precisely the sane utterance spoken, or 
recognitions , in which the utterance spoken is exactly 
recognized ana so reflected in the CKT output. Rejection of 
an utterance is a tnird category and is Indicated by an 
audible 'beep'. 

The rerrote voice input unit allows components to be 
rerroteiy located up to 2000 feet frorr the terminal processor 
ana provides tne means to aajust the volume (amplification) 
of the amplifier to accommodate the normal speaking voice cf 
each particular suDject. 

The tape cartridge unit is a digital tape recorder 
used tc store and recall application data and an individual 
subject's vocabulary reference patterns. Cnee the data 
cartridge is recorded it contains all the information 
necessary tc initialize the Threshold 600 terminal for each 
subject. The T-c00 is capable cf storing a 266 word 
vocabulary which may be recorded or leaded in a few minutes 
using the tape unit. 

2. Spirometer 

A recording spirometer, figure 6, a type cf 
gasometer, was used for measuring and recording vital 
capacity. It consists of a metai tank containing a movable 
piston witn a water seal, air input line, exhaust valve for 
resetting, ink stylus, and revolving cylinder for mounting 
chart paper calibrated in cubic centimeters. 



67 




68 



fcigure 6 . Hecording Spirometer 







r>9 



bigure V. Use ol a Hecorain^ Spirometer to Measure 
and liecord Vital Capa city 





As the subject Dreathes into the .mouthpiece , Figure 
7, air replaces water in the inner piston, which rises by an 
amount proportional to the exhaled air. The subject, once 
fitted with the mouthpiece, is given instructions to inhale 
to the greatest extent possible and then eitale all the air. 
This procedure was repeated three times and the average 
vital capacity used for analysis purposes. 

3 . Teak Flow Meter 

The Wright Peak Flow Meter was used to measure the 
r.aximurr air flow rate in a single forced expiration. The 
instrument, Figure fi, consists cf a pivoted vane, the 
rotation cf which is opposed by resistance cf a spring. The 
plastic mouthpiece fits into the radial inlet which ieaos tc 
the vane. attached to the vane is a spindle and pointer. 
The forced expiration causes the vane and pointer to rotate 
until tne maximum attainable flew has beer, reacted. Cnee 
reached, the pointer is held in position by a ratchet until 
released ty a reset button on the tack of the device. The 
scale is graduated in liters per minute in h liters/minute 
divisions over a range of 60 to 1000 liters/minute. 

Froceduraiiy , the subject stands ana holds the meter 
in a vertical plane as depicted in Figure S. He/she then 
takes as deep a breath as possible, places the mouthpiece in 
the mouth, grips it tightly with the teeth, and seals it 
with his/her lips. The suDject blows cut as hard as 
possible m a short, scarp expulsion cf air. This procedure 



70 





Figure 9. i v easurerrent of SpeaKers' Hate of Air Flow 



71 




was periornea tnree times with the avera 6 e aotea as the 
appropriate pea* expiratory 1‘iow. 

- * -ape Recorder 

An AhAI 4000 DS ftk-II magnetic tape recorder was 
used, for the recording, storage, ana reproduction of speech 
sounas (figure 153). The device is a typical analog magnetic 
tape recorder consisting of three basic parts. These 
include the electronics of the system, the head asseroly, 
and the tape transport. These components taxe a phenomenon, 
such as the speech sound, that changes in time ana records 
it as a continuous event. 



Figure 13. AKAI Tape Recorder 



7£ 






Tapes were recorded lor ail 44 subjects during their 
participation in the experiment. Subject to availability of 
analytical software at NFS, further acoustical analysis 
coula be conducted on speaker variability that might 
substantiate and support statistical conclusions. 

L. INSTRUMENTATION 

Three questionnaires were used to elicit the 
evaluations, judgement, comparisons, attitudes, and 
bacKground history of the subjects participating in the 
experimentation. The first two questionnaires were designed 
[Ref. £€] to provide the necessary information to delineate 
subjects into various groups representing those human 
factors discussed earlier. The third questionnaire was used 
to measure state ana trait anxiety levels during various 
periods of the experiment. The questionnaires were 
"author-administered" in order to provide clarification, if 
neeced, to any written instructions and insure that all 
respondents completed the questionnaires correctly, giving 
appropriate consideration to each iter. 

Three types of questionnaire items were used? open- 
ended, multiple choice, and rating scale. The open-enaed 
items permitted the subject to express his/her answer to the 
question in one's own words. In all cases, these questions 
required short (one or two words) objective replies. The 
multiple choice questions allowed each respondent to choose 



73 



tne appropriate answer from a list of several options. 
These multiple choice questions include "dichotomous" items, 
for example, those requiring only a YES or NO response. 
.Binelly, rating scale items were used to obtain judgements 
or attitudes about some object, concept, or system,. These 
questions permitted the assignment of various response 
alternatives along an unbroken continuum or in ordered 
categories along the continuum. Beth a grapnic scale, 
allowing the respondent to place his/her judgement any place 
along the line, and a numerical scale, confining the 
suDj act's response to a discrete category along the 
continuum were employed. 

1 . User Questionnaire #1 

User Cuesticnnaire ft 1 (Appendix A) employs a 
comoination of question items including open-ended, multiple 
choice, and graphical rating scale items. Questions 1-22 
are designed to obtain information pertaining to 
occupa tional , personal ana physiological characteristics. 
Questions 22-40 obtain altitudinal, comparison, and 
evaluation information pertaining to occupational, 
operational, physiological ana psycho logi cal 
characteristics . 

2 . User Questionnaire #2 

User Questionnaire #2 (Appendix B) utilizes a 
combination of question items including multiple choice and 
graphical rating scale items. Questions 1-3 



o b ta i ned 



ini'orn ation relative to physiological factors while 
questions 1-15 were repetitious items from user 
Questionnaire #1 designed to obtain attitudinai information 
from the subjects after using speech recognition equipment 
for four weeics. 

3.. STAI Questionnaire 

The State-Trait Anxiety Inventory (STAI) is 
comprised of separate self-report scales for measuring two 
distinct anxiety concepts: state anxiety (A-State) and 
trait anxiety (A-Trait). This inventory was developed by 
Spielberger et. ai. at Vanderbilt University and later 
continued at Florida State University. It was reproduced 
with the special permission of the Publisher, Consulting 
Psychologists Fress, Inc., Palo Alto, California. 

The STAI A-Trait scale consists of 20 statements 
(Appendix C) that asi£ people how they generally feel. The 
A-State scale aisc consists of 20 statements (Appendix E) 
out the instructions require subjects to indicate how they 
feel at a particular moment in time. The STAI was designed 
to be self-acministered and was given individually to each 
subject. Complete instructions are printed on each test 
form for coth the A-Trait ana A-State scales. There were no 
time limits imposed for completion of the form. Although 
many cf the items have face validity as measures of anxiety, 
the inventory was referred to as a Self-Evaluation 
Questionnaire. Each subject responds to every STAI item by 



75 



circling tne appropriate numcer to the right of each item 
statement on the form. Scoring keys are depicted with each 
scale in Appendices C and D iRef. 27] . 

1 . EXPERIMENTAL DESIGN 

A three-factor rriied design with repeated measures on 
one factor was employed in this experiment. In 
consideration of the wide variety of human factors to he 
examined, the experiment was designed to allow an analysis 
of three critical factors (occupational experience with 
microphones, operational training method and experience) 
affecting recognition accuracy while s imui taneously 
gathering sufficient data to accomiiish subsequent analysis 
on individual characteristics of speaker variability. The 
two between variables were microphone experience and 
training method, The third factor, expedience (Week#), was 
the within group variaole. A summary of the experimental 
design appears in Figure 11 . 



E . PRCCIIUR E 

1 . Training 

ior the T-c 00, the training procedure consists of 
entering 10 passes of each utterance into the voice 
recognizer. A word list of 100 utterances (Appendix E) was 
provided the subject, each utterance prompted on the CRT, 



76 



MICROPHONE 
EXFERI EN Cl 



MICROPHONE 

INEXPERIENCED 



/ 



/ 



7 - 



/ 



-/ 

/ 



/ 



GROUP I 

b U b j 6 C t 1 



/ / 

/ / 

/ / 

/ / 

/ / 

/ / 

/ / 

/ / 

+ + 

GRCUF II 
Subject 12 



Subject 11 



GROUP III 
Subject 22 



Subject 



Subject 22 



GRCUF IV 
Subject 34 



Subject 44 



/ 

/ 

/ 

/n2 

/Week 



/ 

/ 

/ #3 
/Week 



/ 

/ 

/ 

/#1 

/Week 



SUPERVISED NON-SUFERVISED 
TRAINING TRAINING 



Figure 11. Experinental Design 



77 



the 10 passes spoken, and then the next utterance on the 
list would be prompted. Based on the experimental design, 
subjects were divided into two groups; supervised and non- 
supervised. Those supervised during training received 
detailed instructions, and close scrutiny on each of the 10 
passes by the experiment administrator. If the subject 
failed to clearly pronounce the utterance, if volume level 
was .insufficient, cf if the required .1 second pause was 
omitted, the word was immediately retrained. Non-super vi sed 
subjects received the same instructions, a short 
demonstration cf the training pro ceau re and, when ready , 
were allowed to train the equipment individually with no 
supervision by the experiment administrator. 

Training was accomplished only during the first week 
of the experiment. Subjects training in the morning (0730- 
1220 hours) would subsequently test during those periods and 
likewise for those subjects training in the afternoon 
(1400-1900 hours). Immediately after training, ail subjects 
maae at least two passes cf the entire 100 word vocabulary 
(similar to a test session) to identify any problems in 
training of a particular utterance. If the utterance was 
correctly identified on both passes it was considered as 
trained. however, if an error (either misrecognition or 
non-recognition) occurred, a third pass was made. If less 
tnan two cf the three passes of any utterance was correct, 
that utterance was retrained. 



76 



After the equipment was trained., each subject was 
measured for vital capacity and peak flow rate, finally, 
User Questionnaire #1 was administered. Total time for the 
training session averaged l.i hours per subject. 

2 . Decognition Testing 

following training, subjects were tested on the 
system. Each subject maae 2 passes through the entire 
vocabulary list on each or three days during the week. 
Duration cf the experiment was three weeks. During Week #1 
the vocabulary list remained in the same order as during 
training (Appendix E) while in Week #2 the order or the 
utterances were reversed (Appendix f) and in Week #3 the 
order was randomized (Appendix G). The purpose of this 
charge in vocabulary order was tc reduce the effect cf 
learning due to repetitiveness, anu thereby provide a more 
realistic picture of speaker variability. Data was 
collected in the rorn of recognitions, mi srecogni tions , and 
non-recognitions using Appendix H. 

The STAI questionnaire for A-State scale measurement 
was administered just prior to the first testing session 
(Week #1, Trials 1-2) to determine anxiety levels prior to 
using voice equipment. During Week #2 another STAI 
questionnaire for A-State scale was administered following 
tne first test session of that week. The final STAI form 
for the measurement of A-Trait scales, was administered 



79 



during toeek #2. User Questionnaire #1 was provided to each 
subject at the conclusion of the experiment. 

3 . Vo ca bu la ry 

It was desired that a test vocabulary similar to a 
vocabulary intended for practical application in a military 
environment be usea. Of concern in the design of the 
vo ca culary was the fact that brief monosyllabic words are 
more difficult to recognize that longer polysyllabic words 
or phrases. A relatively equal distribution cf words and 
utterances containing a syllabic content ranging from 1 to 
;5 syllables was selected as the final vocabulary. The 
words were chosen both from previous experimentation [lief 
£3] and the author's military experience. Appendix I 
provides a listing of the 100 utterances usea in the 
experiment and considered as representative of use in a 
military command center. 

G. VARIABLES 

The dependent variables in this experiment were total 
errors, a imear combination of misrecognit ions and non- 
recognitions . Independent variables in the overall 
experimen tai design are experience, job function, ana 

training method. Additional independent variables included 
each of the individual human lactor characteristics elicited 



earlier . 



Data was collected on the eleven subjects within each 
group of the experimental desigu. Each subject made 602 
utterances per weeh for a grand total of 1600 for the 
experiment. Total utterances for the completed experiment 
numbered 79,2.02 (44 x 1600). 



81 



V. ANALYSIS AND RESULTS 



A . GENERAL 

Ail analyses were performed using the MINITAB 
statistical package [Ref. 26]. Repeated measures analyses 
of variance procedures were performed in accordance with 
guidance provided by Pruning and Kintz [Ref. 29] . Non- 
parametric tests for significance between pairs of means, 
several independent samples, and for trend analysis were 
conducted utilising procedures discussed by Conover [Ref. 
30]. Additional parametric analysis followed procedures 
prescribed by Ctt IRef. 31J . 

All mean error rates that appear in figures are of 
untransforrrea data. Since the E test in an analysis of 
variance is vAlia even with mild departures from the 
assumption of equality of variances IRef. 31: p. 63U] , 
hartley's Test for homogeneity of population variances was 
used to determine whether an extreme case (unequal 
variances) existed and thereby determine if a transformation 
or data would be required to stabilize the variances. 
Results of this test are presented in Table IV. The 
assumption cf equal variances is the basis for the use of 
un transformed data in all subsequent analyses. 

The correlation coefficient reported herein is 



Spearman's Rhc . Although the Pearson Product Moment 



TABLE IV 

TEST FOR EQUALITY OF VARIANCES 



+ - 

i 



TATA: 2 

s (group 


I) = 


1947.42 


s (.group 


II) = 


3666.80 


2 

s (group 


III) = 


2625.82 


s (.group 


IV) = 


5636.95 


HYPOTHESES : 







H 0 : All population variances are equal 

H, : Not all population variances are the same 

TEST STATISTIC: 

2 



Mai 



= 2.895 



M il 



Min 



EECISICN: 

Level of significance: .05 

Tabulated value of E = 5.67 

Mai 

CANNOT REJECT THE NULL HYFCTHES IS 



correlation coefficient 'r' is most commonly reported, it is 
however, a random variable, and as such has a distribution 
function. Cocover [Ref. 30] states that 'r' has no value as 
a test statistic in nonparametric tests unless the 
distribution is known. 



83 



£ . OCCUPATIONAL CHARACTERISTICS 
1 . Hypotheses 

The following hypotheses pertaining to the 
occupational characteristics of speakers using voice 
recognition equipment were tested: 



a. H 0 : Job function (microphone experienced users 

versus non-microphone experienced users) 
will have nc affect on recognition 

accuracy . 

K, : Job function (microphone experience) 

affects recognition accuracy. 



c. E 0 : The branch of service the military member 

belongs to will have no affect on 
recognition accuracy. 

L, : Recognition accuracy is influenced by the 

branch of service of the user. 



c. H 0 : A user's attitude pertaining to his/her 

present job satisfaction will nave no 
affect on recognition accuracy. 

H, : Job satisfaction affects recognition 

accuracy . 

d. h 0 : The degree of satisfaction a user derives 

from, being a member of the military will 
not affect recognition accuracy. 

L, : Service satisfaction has an affect on 

recognition accuracy. 



e. H 0 : The amount of previous computer experience 

a user has had will not affect recognition 
accuracy . 

H, : Previous computer experience affects 

recognition accuracy. 



84 



f. H : Competency in a foreign language (fci- or 

multilingual) will have no affect on 
recognition accuracy. 

Hj : Competency in a foreign language will 

affect recognition accuracy. 



2 . Jot; Function 

Tne results of tne experiment for users with ana 
without microphone experience are shown graphically in 
Figure 12. Microphone experienced users fared only slightly 
better than non-iric rophone experienced users. The analysis 
cf variance iANGVA) results in Table V substantiate this 
snowing an I ratio of .277 inaicating no statistically 
significant difference in the user's job function. Thus, 
the null hypothesis cannot be rejected. 

8 .2 

7.2 

e .0 

MEAN % 5.0 

ERROR 

RATE 4.0 

2 .0 

2.0 

1.0 -! 

i 
i 

+ I I 

Microphone No Micro-phone 

Experience Experience 



Figure 12. Mean Error Rate vs. Job Function 




85 



f 



TABLE V 



ANALYSIS CF VARIANCE ECR RECOGNITION ACCURACY 



SOURCE 


SS 


df 


MS 


F 


P 


TOTAL 


73295. 0e 


131 


-- 


- 


— 


BETWEEN SUEJECTS 


54082.50 


43 


-- 


- 


— 


Microphone 
Experience (MIC ) 


436.81 


1 


436.61 


.377 


NS 


Training 
Method (TNG) 


5629.50 


1 


5629.50 


4.868 


** 


MIC x TNG 


1759.59 


1 


1759 .69 


1.521 


NS 


Erro r ( b ) 


46256.60 


40 


1156 .41 


- 


— 


WITHIN SUBJECTS 


19213.41 


86 


-- 


- 


— 


Trials (TR) 


4324.19 


2 


2162 .09 


11 .696 




TR x MIC 


13.50 


2 


6.75 


.037 


NS 


TR x TNG 


74.32 


2 


37 .,16 


.201 


NS 


TR x MIC x TNG 


13.00 


2 


6.5C 


.035 


NS 


E r r o r ( w ) 


14788.40 


80 


184 .85 


- 


— 


[ ** 


SIGNIFICANT 


at p 


< .05 ] 







[ NS: MOT SIGNIFICANT for p < 0,05 ] 



Microphone Experience: Experienced vs. Non-exper ienced 

Training Method: Supervised vs. N on-supervised 

Trials: Wee A #1 (Words 1-100) 

Weejs: #2 (Words 100-1) 

Weev # 3 (Words in randocr order) 



Mean total error rates for microphone ana non- 
microphone experienced users is summarized in Table VI. The 
aefinitive decrease in error rates by time will be discussed 
later in the review of operational characterist ics . 



TAELE VI. 

MAN TOTAL ERROR SATIS FOR JOB FUNCTION EY WEEKS 

(in Percent) 



I 

1 

1 

1 

» 


1 

1 

1 

1 


MICROPHONE 

EXPERIENCE 


I 

i 

i 

i 


NO MICRCPHCNE 
EXPERIENCE 


1 

1 

1 


X WEEKS 


1 

1 

1 

1 

1 


1 

i #1 

i 


1 

1 


7.04 


i 

i 


7.78 


1 


7.41 


1 

1 

1 

1 


! WEEK #2 

l 


1 

1 


c .23 


j 


6.71 


1 

1 


6.47 


1 

1 

1 

1 


1 ' ' " " 

! *ZE£ U'6 
* 


1 

1 


4 . 7y 


i 

i 






6.09 


1 

1 

I 

1 


! x joe 

! FUNCTION 


1 

1 

1 

1 


e. 


i 

i 

i 


€ .63 


1 

1 

1 


6.32 


1 

1 

1 

1 

1 



3. Branch of Service 

Three branches of service were represented in the 
experiment with civilian subjects categorized as a fourth 
Dranch. A Krusltal-Wal lis test for k > 2 samples was used to 
determine if any differences existed. Table VII provides 
the synopsis of results. The null hypothesis, that branch 
of service will cot affect recognition accuracy, is clearly 
rejected. Multiple comparisons were made to determine 
between which pairs of means the differences occurred. The 
results of this test indicated significant differences 
between Army/Navy and A rmy/ Ai r-F'o rce . Differences between 



67 



C iv il i an/Army , Civil ian/Ai r-Joice , Civilian/Navy and 
Navy/A ir-E orce were not significant. 

Further inspection of these results indicated 
possible confounding due tc experience with voice 
recognition equipment. All Air Force personnel and 2 cut of 
£ Navy personnel were experienced users. Segregating the 
experienced and naive users into separate categories and 
then reccnducting the analysis for affect hj brarch of 
service showed nc statistical significance (Table VII). 
Using tne original hypotheses established, the null cannot 
be rejected in either the naive only or experienced only 
cases. !^ean error rates by branch of service for all, naive 
only and experienced only subjects, are presented 
graphically in Figure 13. 



TABLE VII 

AFFECT BY ERANCH Oi SER7ICE 





i 

1 


ALL SUBJECTS 


t 

i 


NAIVE 


i EXPERIENCE]) 


Type of 
Test 


1 

1 

1 

1 


Kruskal- 

Wallis 


i 

I 

i 

i 


Kruskal 

Waiiis 


- | Krusitai- 

! Wallis 


Alpha 


1 

1 


. 05 


i 

i 


.05 


i .05 


Test 

Statistic 


1 

1 

1 

1 


11.90 ** 


i 

f 

i 

i 


2.79 


! .23 


Critical 

Level 


1 

1 

1 

1 


.0075 


i 

i 

i 

I 


.25 


1 

1 

i . 90 


** = Sig 


nificant at stated 


level 


of significance 



fcS 



6.0 

7 .0 

e .0 

MEAN % 5.0 

ERROR 

RATE 4.0 

3 .0 

2.0 




1.0 

i 

i 

+ j j — 

Civilian Arrry 



i i 

i i 

Navy Air Eorce 



figure 12 . Mean Error Rate vs. Eranch of Service 

4. Jot and Service Satisfaction 

SuDjects were divided intc four groups based upon 
tneir subjective responses and included: 

a. Persons who disliked tneir jobs 

o. These who were borderline or neutral in their 
feelings 

c. Individuals wno lilted their present job 

d. Persons who indicated a very definite lining of 
their job — liked their job very much 

The attained test statistic (Table VIII) leads tc the 
decision that the null hypothesis cannot be rejected. The 
correlation coefficient between the two variables was not 
significant and it is concluded that there is no apparent 
correlation between the satisfaction a user has for his/her 



TABLE VIII 



AFFECT BY JOB/SERVICE SATISFACTION 



i 

i 

i 


i 

i 


JOB SATISFACTION 


1 

1 


SERVICE SATISFACTION 


1 

1 

I 


i 

i Type of Test 

i 


i 

1 


Kruskalr'aallis 


1 

1 


Sruskal-Wallis 


! 

1 

1 

1 


1 

i Alpfca 


i 

1 


.05 


1 

I 


.05 


1 

1 

l 

1 


i - 

Test 

i Statistic 


1 

1 

i 


4 .60 


1 

1 

1 

1 


.219 


1 

1 

i 

1 

1 

1 


1 

! Critical 
Level 

1 


! 

i 

i 


.20 


1 

1 

1 

1 


.90 


1 

1 

1 

l 

1 

1 


i 

! Correlation 
I Coefficient 

l 


i 

\ 

i 

i 


.016 


1 

1 

1 

1 


.041 


1 

1 

1 

1 

1 

I 


i ** = sig 


3 if 


icant at stated 


level of significance 


1 

1 

1 



current jet and hew well that user will perform witn voice 
recognition equipment. This particular human factor is 
nevertheless worthy of further examination in the future in 
terms of users whose current jot entails the day to day use 
of voice equipment. 

In the analysis of the affect service satisfaction 
nas on recognition, accuracy, the 2 civilians were removed 
:rom, tne sample population. Subjects were now divided into 
three groups based upon their subjective responses and 
included: 

a. These who are unsatisfied or don't care 

b. Those who are reasonably satisfied 

c. Those who are very satisfied with their 
respective service 



The test statistic (Table VIII) reveals do significant 
difference between groups ana therefore the null hypothesis, 
that the degree of satisfaction a speaKer derives from being 
in the arrred services will not affect recognition accuracy, 
cannot be rejected. Correlation between service 
satisfaction and total error rates, as before, was not 
significant, thus indicating little or no correlation 
between the random variables. 

t . Previous Ccmpute?r Experience 

Subjects were s\ib jec ti vely divided into four groups 
Dased upon their response tc question #32 in User 
Questionnaire #1 ard included persons with: 
a. No experience 
d. Very little experience 

c. Sene or moderate experience 

d. Considerable experience (data processors) 

The analysis provided a test statistic (Table IX) which 
resulted in the rejection of the null Hypothesis and the 
conclusion that previous corrputer experience will affect 
recognition accuracy. Multiple conparisons were performed 
to determ.ine which pairs of means differed. Significant 
dirferences occurred between users with, no and considerable 
experience, very little and moderate experience, and very 
little ana considerable experience. These results 
demonstrate that possession of experience with da ta/ltey beard 
input procedures provide a higher recognition accuracy. 



Explanation for tnis occurrence may be attributed to, for 
example, a data processor's awareness of the time involved 
for manual entry and the associated error rate as well. The 
advantages that voice input offers to those computer 
experienced personnel may well be a psychological or 
motivational factor in addition to its presence as an 
occupational characteristic. 

These results are further substantiated by the 
computed correlation coefficient. Performing a one-tail 
test for negative correlation with the existence of mutual 
independence as tne null hypothesis, we were able to reject 
this hypothesis and conclude that as computer experience 
increases, recognition error rates will decrease (Critical 
Level: « .001). Graphical representation of mean error 
rates for the four groups are shown in Figure 14. 

TABLE IX 



-4- 


AFFECT 


CF 


COMPUTER EXPERIENCE 




1 

1 

1 




1 

1 


COMPUTER EXPERIENCE 


1 

1 

1 


1 

i 

» 

i 


Type of Test 


1 

1 


Xru shal -Wa 1 li s 


1 

t 

1 

1 


1 

i 


A1 pha 


1 

1 


0.05 


1 

1 

1 

1 


5 

1 

1 

! 


Test Statistic 


1 

1 


14.287 ** 


1 

1 

1 

1 


1 

1 

1 

i 


Critical level 


1 

1 


< .005 


1 

1 

1 

1 


l 

1 

1 

i 

i 


Correlation 

Coefficient 


I 

1 

1 

1 


-.516 ** 


1 

1 

1 

1 

1 

1 


1 

1 

1 


** = Significant 


d Z 


stated level of significance 


1 

i 

\ 





8.0 




7.0 




6.0 


MEAN % 


5.0 


ERROR 

RATE 


4 . 0 




2.0 




2.0 




1.0 




None Very Moderate Considerable 

Little 



Figure 14. Mean Error Rate vs. Corrputer Experience 

6. Foreign Language Competency 

Recognition accuracy was ccrrpared between two 
groups, those with a fluent proficiency in a foreign 
language and those without. 32 subjects possessed no 

capability in a seccna language , whereas 11 were competent 
in one or more languages. The median total error rate for 
both groups was 6.28%. A two-sample non-paramet ri c test, 
the Mann-Whi tney , was performed tc detect the existence of 
any differences between the two groups. The computed test 
statistic (Taoie 1 ) clearly shows no significance at the .05 
level ana therefore, the null hypothesis cannot be rejected. 
The critical regions for this two-tail test included values 
of the test statistic less than 672 or greater than 814.8. 



y2 



TABLE 1 



AEEECT CE COMPETENCY IN ANOTHER LANGUAGE 



! ! FOREIGN LANGUAGE 


1 

1 

1 

1 


Type of Test 


1 

1 


Mann-Whitney 




1 

l 

1 

1 


Alpha 


1 

1 


0.05 




1 

I 

i 

i 


Test Statistic 


1 

1 


764.5 




1 

{ 

1 

1 


Critical Level 


1 

\ 


.5776 




1 

1 

1 


** = Significant 


ax 


stated level of significance 





C. OPERATIONAL CHARACTER 1ST I CS 
1. Hypotheses 



The following hypotheses apply to the operational 
characteristics under which the subjects were tested. 



a. H 0 : The method of training a user for voice 

recognition operation (supervised versus 
non-supervised) will not affect recognition 
accuracy . 

R, : Method of training will affect recognition 

accuracy 



d. R : The time of day in which a user trains the 

equipment will not affect recognition 
accuracy . 

H| : Recognition accuracy of the user will be 

affected by tne time of day in which he/she 
trains the voice recognizer. 



c. H 0 : The period of tne week In which the user 

trains the equipment will not affect 
recognition accuracy. 

H, : The period of the week in which the 

equipment is trained will affect 
recognition accuracy. 



d. H 0 : Experienced users will acquire the same or 

greater error rates than inexperienced 
(naive) users. 

h,: Experienced users will have icwer error 

rates than naive users. 



H 0 : Recognition accuracy will not re affected 

hy weekly experience. 

H, : A user will demonstrate reduced error rates 

(decreasing trend) as experienced will 
voice recognition equipment increases. 



e. The operational esse with which voice 

recognition equipment may be used will have 
no affect on recognition accuracy. 

H, : Ease or use will affect recognition 

accuracy . 



d . Method of Training 

The results of the experiment for users receiving 
eitner supervised or non-super vi sea training are depicted 
graphically in Figure 15. Users who received supervision in 
the training mode fared significantly better than those who 
did not. The analysis of variance table ( AKOVA ) in Table V 
substantiate this claim, providing an i ratio of 4.S6S and a 
critical level of approximately .U35. Thus, the null 
hypothesis is rejected ana we may conclude that the method 
of training does affect recognition accuracy. Mean total 



errcr rates for supervised and ncn-supervised users are 
surrmariz ed in Table XI. 




2.0 - 
1.0 - 



0.0 



Supervised 

Training 



i 

Non-Superv i sea 
Training 



Figure 15. Nean Error Fate *s. Training Method 



TABLE XI . 

MAN TOTAL ERROR RATES FOR NETHCD CF TRAINING EY '4 SEES 







( in 


Percen t ) 








1 

I 


SUPERVISED 

TRAINING 


i NCN-SUPERVISED 
i TRAINING 


I 

1 

1 


X WEEKS 


WEEK #1 


1 


6.21 


! £.64 


1 

1 


7.41 


WEEK #2 


i 

1 


6 .22 


! ? .63 


1 

1 


6.47 


WEEK #2 


1 


4.1? 


I 6.00 


1 

1 


6.09 


X JO 3 
FUNCTION 


1 

» 

1 


6 .22 


! 7.41 


1 

1 

1 


6.22 



96 



3 . Tire of Cay and Week 

Subjects were blocked by tire of day; morning and 
afternoon, and by tire of week; early (Monday-Tuesday) , rid 
(Wednesday-Thursday ) or late ( Friday-Saturday ) . A Mann- 
WJbitney test was performed to determine if differences 
existed between the two tire of day groups. Morning users 
nad a median error rate of 5.1% while afternoon users bad a 
6 . 67 % error rate. Because of equal sample sizes, a 
parametric t-test was performed to confirm results cf the 
non-parametr :.c test. The presented in Table III will not 
allow us to reject the null hypothesis. Critical regions 
for the Man.i-Whi t ney test included values of the test 
statistic less than 1-11.5 and greater than 576.5. 

With three groups in the time cf week variable, the 
analysis utilized the Kruskal-Wal 1 is test for determination 
of differences among the groups, The null hypothesis cannot 
ce rejected with a test statistic less than 5 . S9 , for the 
Chi-square value with two degrees of freedom. The 
correlation coefficient was found to be significant at the 
0.0c level in a test for negative correlation. A premature 
conclusion that training occurring in the latter portion of 
the week would yield lower error rates appeared to be 
couu ter-in tui t ive . It was thought that fatigue, and 
interruption of a weekend would result in poorer training 
efforts and hence lead to higher error rates in the future. 
Upon further analysis, this reversed correlation was found 



97 



to be the result of possible confounding arising from the 
large nurrber of experienced users who trained in the later 
period of the wee*. Eight out of thirteen late weeh users 
were experienced and with their rerroval frorr consideration, 
the correlation between time of weex and total error rate 
became statistically non-significant. 



TABLE XII 







AFFECT OF 


TIPI OF DAY 


AND 


WEEK 


















1 

I 


TINE 


01 


DAY 


1 

1 


TIPE Cl WEEK 


Type of Test 


j 


Pann-Whi tney 


! t-test 


1 

1 


Krusical-Wal lis 


«T3 

a 

-H 


i 

1 


U .85 




! e. £5 


1 

1 


0.05 


Test 

S tatistic 


1 

1 

i 

I 


469 




1 

! -1.16 


1 

1 

1 

1 


4.14 


Critical 

Level 


1 

1 

1 

1 


.275 




1 

1 

1 ncn 

1 • w 4. 


1 

1 

1 

1 


.25 


Correlation 

Coefficient 


1 

1 

1 

1 


.093 




1 

1 

i .092 


1 

1 

1 

1 


-2.67 ** 


= Significant at 


stated level 


Of 


significance 



4 . User Experience 

Two sets of hypotheses in Section V.C.l.d are 
incorporated into this phase of the analysis. The analysis 
of the first set was performed using the Pann-Wbitney test 
and the associated results are summarized in Table XIII. 
The median error rates for naive users was 7.28* while 
experienced users attained a 2.75* error rate. Both groups 



98 



baa equal numbers of supervised and unsupervised users, The 
correlation coefficient yielded one cf the strongest 
correlations between two variables within the experiment. 
Ihe null hypothesis can be rejected aad it is therefore 
concluded that experience will affect recognition accuracy. 



TABLE XIII 





AFFECT 


CUE 


TC USER EXPERIENCE 




i ! EXPERIENCE ! 


i 

1 

1 


Type of Test 




Mann-Whi t ney 


1 

1 

I 

1 


1 

i 

i 

» 


Alpha 


» 

1 


2.05 


1 

1 

1 


i 

i 

i 

i 


Test Statistic 


1 

1 


£69.0 ** 


1 

i 

) 

! 


i 

i 

i 

i 


Critical Level 


i 

1 


< .0001 


! 

1 

i 

_ J 


i 

i 

i 

i 

i 


Correlation 

Coefficient 


1 

1 

1 


-.5yy ** 


1 

i 

i 

i 

i 

i 


i 

i 

i 

-e — - 


= Significant 


at 


stated level of significance 


' i 

i 

i 



The analysis of the second hypothesis of V.C.l.d is 
depicted graphically in Figure 16, (Trials by Job Function) 
and Figure 17 (Trials by Training Method). In each case no 
interaction is present, with the weekly error rate showing a 
steady drop of approximately .£ to 1.4% each week. This 
graphical interpretation is proven statistically in the 
ANOVA presented in Table V. That is, the F ratio is well 
above the 3.11 required for a level of significance of 0.06. 
The null hypothesis is rejected and it is concluded that 



yy 



8.0 



(7.04) 
(6 .23) 



Week #1 



7.0 

6.0 




( 7 . 78 ) 



(6.7] ) Week #2 



MEAN % 


5.0 -! ■ 


(5.3y) Week #3 


ERROR 


! (4.79) 




RATE 


4.0 





3.0 - 



2.0 - j 

i 

1.0 -| 

i 

0.0 + | | 

Microphone No Microphone 

Experience Experience 



figure 16. Trials versus Job Function 



8 .0 

7.0 

6 .0 

MEAN % £.0 

ERROR 

RATE 4.0 



( 6 . 21 ) 
(5.22) 
(4.17) 




(e.ei) 
( 7 . c3 ) 

(£.00 ) 



Week si 
Week *2 



Week #3 



2.0 -| 
i 
i 

2.0 -j 

1.0 -! 

i 

0.0 + j 

Supervised 

Training 



N cn-Superv isea 
Training 



Figure 17. Trials versus Training Method 



100 



users will 


improve 


( reduce ) 


thei r 


error rates 


through 


weekly 


iteration . 


This 


concius 


ion 


was further 


verified cy 


application 


cf the 


Ccx ana 


Stuart Test for 


Trend . 


The 



following comparisons were made between: 
a. Week #1 ana Week #2 
d . Week #2 and WeeK #2 

c. Week #1 ana Week >*2 

In all tnree cases, the null hypothesis, that there is no 
downward trend, was clearly rejected. 

5 . Sase of Use 

Based cn subjective responses by those participating 
in the experiment four grouts were categorized. They 
include : 

a. Users who consider voice recognition equipment 
difficult tc use. 

d. These who had no opinior either way. 

c. Users who stated tnat voice equipment is easy tc 
u se . 

a. These who feel that voice recognition equipment 
is very easy tc use. 

The results of this analysis are summarized in Table XIV. 
The test statistic is less than the Chi-square value of 
y .42c with three degrees cf freedom ana therefore the null 
cannot be rejected. The computed correlation coefficient is 
not significant at the 2.2c level. 



TABLE XIV 



AE EEC? DUE TO EASE OE USE OF VOICE EQUIPMENT 



1 1 EASE OF USE ! 


1 

i 

1 


Type of Test 


1 


KrusAa l-Wa llis 


1 

1 

1 

1 


1 

1 

\ 


Alpha 




0.05 


1 

1 

1 

\ 


i 

i 

i 

! 


Test Statistic 


1 . 
1 


4 .814 


[ 

1 

1 

1 


! 

i 

1 


Critical Level 


1 

1 


> .25 


1 

1 

1 

1 


1 

1 

1 

j 

i 


Correlation 

Coefficient 


1 

1 

1 

1 


.15 7 


\ 

\ 

1 

1 

1 

! 


i 

! 

1 


** - Significant 


at 


stated level of significance 


1 

1 

1 



E. PERSONAL CHARACTERISTICS 
l . ty po the ses 

Tie following hypotheses were tested pertaining to 
the personal characteristics c:' voice recognition users: 



a. B 0 : Race of the user will not affect 

recognition accuracy. 

E, : A difference in recognition accuracy exists 

between users of different race. 



o. h 0 .: The marital status of the user will not 

affect recognition accuracy. 

L. : A user's marital status will have an affect 

on his/her recognition accuracy. 



H 0 : Size of a user's family will not affect 

recognition accuracy. 

E. : Family size will have an affect on 

recognition accuracy. 



10c 



c. H 0 : The religious pref erence/bactground of a 

user will have dc affect on his/her 
recognition accuracy. 

H, : A user's religious pref erence/ bacicground 

will affect recognition accuracy. 



a. H 0 : A person's accent will not affect his/her 

recognition accuracy. 

H, : Accent affects recognition accuracy. 



e. E 0 : The place of Dirth of a user will have no 

affect cn recognition accuracy. 

E, : One's place of birth affects recognition 

accuracy . 



H 0 : The geographic origin of a person will not 

affect his or her recognition accuracy. 

Ej : A person's recognition accuracy will be 

affectea by geographic origin. 



f. E 6 : The level of education an individual has 

attained will not affect his/her 
recognition accuracy. 

H, : Education level of a user affects 

recognition accuracy. 



E 0 : The Soc io-econorric class of a user will not 

aifect recognition accuracy. 

H, : A user's recognition accuracy will be 

affected by socio-economic class standing. 



h. H 0 : Fast oral-surgery or orthodontal care will 

net affect recognition accuracy of the 
user . 

H, : Recognition accuracy of the user will be 

affected if he or she has undergone oral 
surgery or orthodontal care. 



iz 



Race 



2 . 

Twc racial backgrounds were represented in the 
sairplea population. Thirty-eight Caucasian ana six Negro 
suDjects participated in the exper iirentat icn . The median 
total error rate for Caucasian personnel was 6% end 6.8% for 
Negro users. A i v ann-Whi tney test was performed tc detect 
the presence of any difference between the two groups. The 
calculated test statistic (Tatle IV) was net significant at 
the .05 level ana the null hypothesis cannot be rejected. 
Critical regions for the test statistic in this two-tail 
test were values less than 79? and greeter than 912. 



TABLE XV 

ABJECT OT RACE ON RECOGNITION ACCURACY 



! ! RACE ! 

1 1 


1 

1 

! 

1 


Type of Test 


! 

1 


Mann-Whitney 


1 

! 

1 

1 


! 

1 

1 

i 


Alpha 


1 

1 


2.05 


1 

! 

1 


1 

i 

1 


Test Statistic 


1 

I 


£43.0 


1 

1 

1 

1 


1 

4 

i 

l 


Critical Level 


! 

! 


.6941 


1 

1 

1 

1 


1 

1 


= Significant 


at stated 


level of significance 


1 

1 

! 



3. Marital Status and Barrily Size 

The sample population consisted of 14 single, 25 
married, 3 divorced, and 2 ether (separated, widowed) 
personnel. A Kruskal-Wallis test for h > 2 samples was usee 
to determine if any differences in means existed between the 



groups. Eecauss the computed, test statistic (Table XVI) is 
less than 7. Sib, the tabulated chi-square value with 3 
degrees of freedom, the null hypothesis cannot be rejected. 
No correlation coefficient was computed for marital status 
due to the nominal scale of measurement. 



TABLZ XVI 

AFFZCT 0* MARITAL STATUS ANE 1ANIIY SIZI 



1 

1 


1 

i 


NA3ITAL STATUS j 


FAN I IY SIZ3 




+ 

1 

1 

1 


i 

iType of Test 

f 


1 

1 


Kruskai- Wallis | 


Kruskai-Wa 11 is 




1 

1 

1 

1 


Alpha 

1 


1 

1 

.i 


o* c: t 

. w ! 


.05 




1 

1 

1 

1 


1 "* 

Test 

j Statistic 

1 


1 

1 

t 

1 

+ _ „ 


1 

2.81 | 


.2 iy 




1 

1 

1 

1 

1 

1 


i 

i Critical 
i Level 

j 


\ 

1 

\ 

1 


> .3 i 


> .3 




1 

1 

1 

1 

1 

1 


1 

i Correlation 

I Coefficient 
! 


1 

1 

1 

1 


1 

NA I 


.043 




1 

1 

1 

1 

1 


i ** = Significant at stated level 


of significance 




1 

1 


The s 


ample 


population subdivided into five 


gro 


UpS 


for family 


size 


with a range from 


no children to s 


ubjs 


CIS 


having four or more children. A Kruskal-Wali is ts 


St 


v a s 


again used 


tc determine if a di 


fference existed 


and 


5 5 


before, the 


null 


hypothesis cannc 


t be rejected. 




The 


computed correlation coefficient indicates 


mux 


uel 


independence 


fc £ Z V 0 


en family size and 


total error rate 


of 


a 


voice reccgni 


lion 


user . 









10b 



4. 



Rel igi cus Preference 



Although a diverse variety of religious preferences 
>.ere enumerated by participating subjects, seme were peeled 
to preclude nurerous samples sizes of just one person. For 
example, Methodist and Epi sccpal i an vere combined into the 
Protestant category and so forth. In all, six groups were 
represented and included Catholic, Protestant, Jewish, 
Baptist, No Preference ana Others {these who could not be 
readily grouped into one cf the aforementioned categories). 
Using the Kruskal-Wallis test to check for differences 
between means, the obtained test statistic (Table XVII) does 
not allow for the rejection of the null hypothesis. 
Therefore, it ray te concluded that the reiigious preference 
of the user will not affect his/her recognition accuracy. 



TABLE XVII 

AE5ECT OF RELIGIOUS PREFERENCE 



1 

1 

1 




1 

1 


RELIGIOUS PREFERENCE 


1 

! 

1 


1 

1 

f 

1 


Type of Test 


j 


Kruska 1-Wa Ills 


1 

1 

i 

1 


1 

1 

! 

1 


Alpha 


1 

i 


0.05 


1 

1 

i 

1 


1 

1 

1 

1 


Test Statistic 


1 

1 


3.25 


1 

1 

1 

» 


1 

I 

1 


Critical Level 


] 

1 


> .25 


1 

1 

1 


1 

1 

+ 


** = Significant 


a i 


stated level cf significance 


1 

1 



106 



c; 

%*J • 



Accent 



Ten subjects possessea sorre type of noticeable 
accent, as de teririnsd by the subject and experiment 
administrator . Seven were Southern and three were 
categorized as Other (Spanish, Eostonian). Remaining 
subjects were placed in a 'No Accent' group. The resultant 
test statistic (Table XVIII) was slightly less than the 
tabulated Chi-square value cf 5.991 with two degrees cf 
freedom. As such, the null hypothesis cannot be rejected. 
An additional check: was accomplished by combining the two 
accent groups into one generic entity and performing a 
hann-Whitney test to detect a difference between the two 
groups. Agair the null hypothesis cannot be rejected at the 
stated level of significance. Correlation analysis was not 
performed due to the nominal scale cf measurement. 



TABLE XVIII 

AFFECT OF ACCENT ON RECOGNITION ACCURACY 



1 

! 

1 

t 


ACCENT 
(3 groups) 


1 

f 

1 

1 


ACCENT 
(2 ?roups ) 


Type of Test ! 


Kruskai-Walli 


s ! 


Nann-Whi tney 


Alpha ! 


.05 


1 

1 


.05 


Test ! 

Statistic ! 


5.73 


1 

1 

1 

1 


704 


Critical 
Level | 


.055 


! 

1 

1 


.09 


** = Signif 


icant at stated 


level 


of significance 



Although me null is not rejected, the critical level is 
sufficiently close to the stated level of significance. 
Thus, r.ean error rates are illustrated in Figure IS for 
further examination. 




1.0 - ! 

I 

0.0 4 ! I ! — 

No Accent Southern Other 



Figure 18. Mean Frrcr Hate vs. Accent 



6. Place of Birth and Geographic Origin 

Subjects were asited to provide their state of birth 
and their responses were subsequently classified into one of 
the following six generic groups: 

a. Overseas 

b. Northeast United States 



10S 



c. Southeast Unitea States 

d. Mid-Central United States 

e. Southwest United States 

f. Western United States 

Applying the Kruskal-Wall is test to the corrpiied data, the 
obtained test statistic (Table XIX) is insufficient to 
reject the stated null hypothesis. 

Eecause a person's place of birth is not necessarily 
the environment in which that individual grew up in (ie. 
during ages 2-18), data pertaining to geographic origin was 
also tested to determine if any negative affect would be 
encountered. The geographic areas used were the same as 
place of birth. Calculated results point to the same 
conclusion; the null hypothesis of Section V.D.l.e. cannot 
be rejected. 



TAELI XIX 

AFFECT CF PLACE OF BIRTH AND GEOGRAPHIC ORIGIN 



1 1 

1 I 

1 1 


PLACE Of EIRTE 


1 

1 

1 


GEOGRAPHIC ORIGIN ! 

i 


1 1 

!Type of Test ! 

1 I 


Krusical-Walli s 


1 

1 

1 

1 


KrusKa 1-Wellis ! 

1 


1 1 ' 

Alpha ! 

t i 


.25 


I 

1 

1 . 


! 

.25 j 

, i 


Test I 

Statistic 1 
1 _ 1 


5.32 


1 

1 

1 

1 

1 

1 . 


1 

1 

1 

4.3y ! 

. i 


Critical j 

! Level 

1 J 


> .25 


1 

1 

1 

1 

1 

1 _ 


i 

> .25 i 

1 


1 1 

** = Significant at stated 


i 

level 


1 

of significance ! 



109 



7. level of Eaucatio 



Ttie sampled population partitioned into the 
following five categories: 

a. High School graduates. 

b. Individuals with 1 to 4 years cf college but no 
degree . 

c. College graduates. 

d. Individuals wcrKing toward a. graduate decree. 

e. Persons accorded a graduate degree such as a 
Masters or Eoctcrate. 

The data obtained from the five groups was tested 
for any significant difference between groups. The test 
statistic (Table IX) leads to the rejection cf the null 

hypothesis and the conclusion that level of education 

affects the overall error rate for voice recognition users. 
A relatively strong positive correlation exists with a 
critical level of 0.006. That is, as the individual 

increased in level of education, a concoritant decrease in 
error rate occurred. 

Multiple comparisons between The various groups 
showed the predominant influence tc be graduate students, 
further examination indicated possible confounding due tc 
that group's prior experience with voice recognition 

equipment. Eleven cut twelve graduate students were 



TABLE n 

AFFECT OF I EVIL CF EDUCATION 



1 

i 

1 . . . . 


1 


EDUCATION (ALL) 


i 


EDUCATION (NAIVE) ! 

i 


i 

iType of Test 

1 


1 

! 


Krushal-V/alli s 


i 


1 

Erusfca i-Wa L 1 is ! 


i 

Alpha 

1 


! 


.05 


l 

I 


.05 ! 


1 

! Test 

Statistic 

1 


i 

i 

i 


14.200 ** 


I 

I 


1 

4.18 ! 

1 


1 

I Critical 

! Level 

1 


i 

t 


.015 


! 

i 


! 


1 

! Correlation 

! Ccef f ici en t 
1 


i 

i 

i 

i 


-.250 ** 


i 

i 


1 

.062 ! 

i 


1 

** = Significant at stated 


level 


i 

of significance I 



experienced users. These experienced users were stripped 
cut cf the sample and the Krushal-Val 1 is test applied tr 
only those that were naive to voice technology. Using the 
same hypotheses, the obtained test statistic does net allow 
for the rejection of the null. This, and the recomputed 
correlation coefficient corrctcrate the theory of 
confounding and the earlier conclusion is now amended to 
state that level of education will not affect recognition 
accuracy. Mean error rates for all education levels are 
snown graphically in Figure 19. Error rates for both, total 
sample population and naive users only, are included. 



Ill 



9.0 



e.0 

7.0 

6.0 

MEAN % 5.0 

ERROR 

RATI 4.0 

2.0 




(3.73) 



2.0 

1.0 




All Subjects 
Naive Subjects 



I | ! I 

High 1-4 College Grad Grad 

School College Grad Student Degree 



E igu re 19. Mean Error Rate vs. Education 



S . Soclc-eccncrric Class 

A variety of socio-econoiric classes were presented 
tc the participants for selection with one of the following 
five chosen by each subject: 



a . 


Upper 


lower 


class 


t . 


lower 


rridd le 


class 


c • 


Midd le 


class 




d . 


Upper 


a idd le 


class 


e . 


Lower 


upper 


class 



The analysis of total error rates for these five grouns 
(Table XXI) yielded a test statistic that would not allow 
for the rejection of the null hypothesis, and it may be 



112 



concluded that socio-economic class will not affect 
recognition accuracy, The negative correlation indicates 
that individuals of a lower socio-econorric class tend to 
acquire higher error rates although the coefficient is not 
significant at the 0.05 level (critical level: 0.156). 



TABU XXI 





AFFFCT 


OF SOCIC-ECCNCMI C CLASS 


! ! SCCIO-l'CONOriC CLASS j 


i 

1 

f 

1 


Type of Test 


! KrusKal-Wa llis ! 


1 

i 

i 

i 

i 


Alpha 


! e . 05 ! 


i 

i 

r 


Test Statistic 


! 1.95 ! 


} 

1 

» 

1 


Critical level 


! .83 ' 


1 

f 

1 

1 

1 


Correlation 

Coefficient 


! 1 
! -0.152 ! 


i 

i 

i 


** •- Significant 


at stated level of significance ! 



9 . Dental 

Subjects were queried as to their history of dental 
care, in particular, oral surgery and/or orthodontal 
correction . Two groups resulted upon whose lata a Mann- 
W’nitney test was performed to determine if any difference 
existed between them. The null hypothesis cannot be 
rejected due to the computed test statistic (Table XXII). 
Critical regions for the test statistic included values 
greeter than 714. €9 and less than €35.31. 

113 



TABLE XXII 



AFFECT OF PAST AMD/OR PRESENT DENTAL CARE 



! ! DENTAL CARE ! 


1 

1 

1 

1 


Type of Test 


1 

! 


l*ann-Whitrey 


1 

1 

1 

1 


1 

1 

1 

1 


Alpha 


1 

1 


0.05 


1 

1 

t 

1 


1 

1 

1 

1 


Test Statistic 


1 

1 * 


638 .5e 


1 

1 

1 

1 


1 

1 

I 

1 


Critical Level 


l 


.3643 


1 

1 

1 

1 


1 

1 

! 


** = Significant 


at 


stated level of significance 


1 

1 

1 



E. PHYSIOLOGICAL CHARACTERISTICS 
1 . Hypo theses 

The following hypotheses pertaining to various 
physiological characteristics of voice recognition equipment 
users were tested. 



a. H 0 : The user's age will net affect his/her 

recognition accuracy. 

H, : Age will affect the total error rates of 

users of voice recognition equipment. 



t> . H 0 : The height and weight of an individual 

using voice technology will not affect 
overall recognition accuracy. 

H, : Recognition accuracy will be affected by an 

individual's weight. 



c. H 0 : The vital capacity and rate of air flew of 

a user will not affect his/her recognition 
accuracy . 



114 



K : Recognition accuracy will be affected by a 

person's vital capacity and rate of air 
flow. 



d. E 0 : The overall physical condition of the user 

will not affect his/her recognition 
accuracy . 

H t : Recognition accuracy will affected by one's 

physical condition. 



K 0 : Formal speech and/cr voice training will 

net affect recognition accuracy. 

H | : A user's recognition accuracy will be 

affected by any formal speech or voice 
t raining/therapy . 



2. Age 

The subjects ranged in age from 20 to 47 and were 
divided into five groups for purposes of the analysis. 
These grouus and their mean error rates are: 



a . 


20 


t c 


24 


(4.68%) 


b . 


25 


to 


26 


(7.03%) 


c . 


27 


to 


31 


(7.15%) 


d . 


32 


t c 


35 


(5.73%) 


e . 


36 + 






(6.10%) 



These five groups »e re tested to detect for differences 
among their means. The obtained results (Table XX III) show 
that the null hypothesis, stated above, cannot be rejected 
and that the two variables, age and total error rate, are 
mutually independent. 



115 



TABLE XXIII 



AFFECT ON RECOGNITION ACCURACY DUE TO AGE 



4 — 

1 

1 

1 




1 

1 


AGE ! 


1 

1 

1 


Type cf Test 


1 

1 


KrusAal-Wailis ! 


1 

1 

i 

i 


Alpha 


1 

1 


0.05 ! 


i 

• 

i 


Test Statistic 


1 

1 


2.26 j 


i 

i 

i 

i 


Critical level 


1 

» 


> .50 i 


i 

i 

i 

i 

i 

i 


Correlation 

Coefficient 


1 

1 

1 


1 

-0.05 ! 


i 

\ 

+- 


** = Significant 


at 


stated level of significance ! 



2 . Height and Weight 

Subjects ranged in height from 60 tc 7? in :hes . 
Four groups were generated for analysis and are listed :elow 
with their respective mean error rate. 

a. 60 to 64 inches (6.46%) 

b. 65 tc 69 inches (6.67%) 

c. 70 to 72 inches (5.29%) 

d. 72 to 77 inches (7.14%) 

The results of the analysis, as summarized ir. Table XXIV, 
indicate that the null hypothesis cannot be rejected. The 
small positive correlation coefficient is not significant at 
the .05 level and thus the variables in question may he 
considered to te independent. 



lie 



Weights of the subjects ranged frorr 110 to 240 
pounds. Examination for scfre natural ''break' points in this 
range resulted in the creation of the following five groups 
and their corresponding rrean error rates. 



a . 


110 to 


125 


pounds 


(6 . 4£%) 


t). 


126 to 


145 


pounds 


(6.65%) 


c . 


146 tc 


175 


pcund s 


(5.13%) 


< l . 


176 t o 


199 


pound s 


(7.18%) 


e . 


200 + 




pounds 


(5.88%) 



The null hypothesis cannot be rejected, with the correlation 
coefficient indicating independence between the two 
variables . 



TABLE XXIV 

AFFECT OF HEIGHT AND WEIGHT ON RECOGNITION ACCURACY 



1 

1 

1 


1 

1 


HEIGHT 


1 

1 


WEIGHT 


i 


i 

iType of Test 

I 


l 

! 


Eruskal-Walii 


s ! 


Kru ska 1-Wa 1 1 i s 


i 

1 

} 

1 


‘ Alpha 

1 


1 

1 


.05 


1 

1 


• Kj w 


1 

I 

1 


l 

Test 

Sta tistic 

I 


1 

1 

1 

1 


CO 

• 

1 1 


1 

1 

1 


• 

tC 

oi 


» 

I 

1 

I 

1 

I 


1 

Critical 

Level 

1 


1 

1 

1 

t 


> .50 


l 

i 

1 

i 


.75 


l 

l 

\ 

1 

I 

1 


i — ■ — 

! Correlation 

! Coefficient 
1 


1 

1 

1 

I 


.121 


1 

f 

1 


.064 


1 

1 

i 

i 

i 

i 


** = Significant at stated 


level 


of significance 


i 

i 

i 



■ 117 



The similarity in test statistics and correlation 
coefficients cf height ana weight may he explained ty 
observing the correlation between height and weight itself. 
A Pearson product moment correlation of .321 suggests a 
strcng positive association between the two variables and 
thus serves to confirm the similar results of the analysis. 

A. Vital Capacity and Rate of Air Flow 

The vital capacity of participating subjects ranged 
from 1917 to 5725 cubic centimeters. The following four 
groups were created: 

a. 1917 to 2850 cubic centimeters 

b. 2351 tc -5767 cubic centimeters 

c. 2925 tc 1450 cubic centimeters 

a. 4658 to 5725 cubic centimeters 

Analysis for differences between the means cf the various 
groups generated the test statistic (Table XXV) that 
resulted in the rejection cf the Dull hypothesis. A 
correlation between increased vital capacity and low error 
rates was found tc be significant using a cne-tail test for 
negative correlation (critical level: .045). 

The rate of airflow characteristic had a range of 
212 to 721 liters per minute. This range was divided by 
four and the following groups were used for the analysis. 
The f cur included : 



lie 



a. 212 to 331 liters/rrin 
t. 332 tc 46C liters/min 
c. 461 to 5S9 liters/rrin 
a. 600+ liters/min 



TABLE XXV 

AFFECT CE VITAL CAPACITY AN E RATE OF AIR FLOW 



1 1 

1 I 

1 i 


VITAL CAPACITY 


1 

1 

1 


RATE OF AIR FLOW 


I 

i 

1 


1 i 

[Type of Test | 
1 __ 1 


Xruskal-Wallis 


1 

I 

1 

_ 1 


Kruskal-Wallis 


I 

| 

1 


1 Alpha 

i i _ _ 


.05 


1 

1 

1 

1 _ 


.05 


1 

1 

1 


I Test ! 

j Statistic ! 
1 1 


8.58 ** 


I 

1 

1 

1 

i 

— 1 - 


6.38 


1 

1 

1 

1 

1 


i i 

Cri ti cal 




1 

1 

1 




1 

1 

1 


! Level | 

1 _ _ „ 1 


.3375 


[ 

1 

- 1 - 


.095 


1 

1 

1 


1 i 

! Correlation i 

! Coefficient | 
1 1 


-.267 ** 


1 

i 

1 

1 

1 

1 


-.318 ** 


! 

1 

1 

1 

1 

1 


i 1 t 

! ** = Significant at stated level 


of significance 


1 

1 

1 


The: test statistic 


does not allow 


f cr 


the rejection of 


the 


null, tut a 


statistically 


significant correlation 


coefficient provides an indication 


tha 


t as rate cf air 


flew 



increases, error rates will decrease. Figures 23 and 21 
depict mean error rates for affects due tc vital capacity 
and rate of airflow. Figures 22 and 23 provide the scatter 
plots upon which the correlation coefficients were 
determined . 



119 




(cuoic centimeters) 

Fig u re 20. ^eer. Error Rate vs. Vital Capacity 



£ . 0 

7.0 

e .0 

I* SAN % £.0 

IRRCR 

RATI 4.0 

3.0 

2.0 

1 .0 



Figure 




-r , — . i , 

212-331 332-460 461-599 600 + 

( ii ters/minute ) 

21. Mean Error Rate vs. Rate c f Air Flow 



120 



ERRORS (<*) 

15.0 + 

12.0 + 

8.0 + 

4.0 + 

0.0 + 
iooo7~ 



* 




* 



+* * 

* 

4c 

* 4c 

* * 



* 

* 



* 



* 



* 



* 




---+ + + ♦ - 

2000. 3000. 4000. 5000. 

VITAL CA PACITY (LIT ERS/S1IN) 



figure 22 . Scatter Flct for Vital Capacity 



ERRORS (%) 
15.0+ * 



4c 



12.0 + 



3.0 + 



4.0+ * 



* 

4c in 



2 * 

* 

4c 4c 

4c 4c 

4e ** 

** 2 * 

4c 

4c 4c 

4c 4c 



0.0 + 

200 . 



+ + + + . 

3 40. 4 3 0. 5 2 0. 76 0, 

RATS OF AIR FLOtf (CO. CM) 



Figure 22. Scatter Plot for Rate of Air Flow 



* 



6000 



900 



121 



The dilemma of a non-significant Kru sltal — Vva Ills test 
and. a significant correlation coefficient can only he 
explained by the subjective division of the range of flow 
rates into the groups used for the analysis. Biased, 
grouping could provide a matrix that would yield a 
significant test statistic to shew a difference between 
means but in the final analysis, credibility for This 
characteristic as a determinant in personnel selection would 
be lest. 

5 . Physical Condition 

lour groups resulted from the subjects' self- 
appraisal of their general physical condition and include 
categories of fair/poor, average, good and outstanding 
physical condition. Their total error rat es were examined 
to determine if a difference between the groups existed. 
The results presented in Table XXVI do net allow us tc 
reject the null hypothesis. Additionally, a negligible 
correlation coefficient presumes the two variables tc be 
independent of one another. 

Although a subjective response was the determinant 
for this characteristic, seven subjects who had colds, 
trained the recognizer. Their condition was such, that a 
distinct nasality was present while they sprite. A b'ann- 
Wbitney test was performed to determine if a difference 
between the healthy and 'cold' groups existed. The test 
statistic of Table XXVI further verifies our previous 

l &C. 



conclusion; the null cannct be rejected. The critical 
regions for the Mann-Whitney test correspond to values 
greater than 8S2.8 and less than 771.4 

finally, the analysis for affect dup tc formal 
speech therapy or voice training resulted in a test 
statistic that would net allow for the rejection of the null 
hypothesis, that speech therapy cr voice training will net 
affect a user's recognition accuracy. Critical regions 
corresponded tc values greater than £35 and less than 695. 



TABLE XXVI 

AFFECT CN RECOGNITION ACCURACY HJE TC PHYSICAL CONLITION 



+ + 





1 

! 


PHYSICAL 
CON LIT I CN 


l 

| 

1 

1 


SPEECH 

TRAINING 


1 

I 

1 

1 


COLL 


Type cf 
Test 


j 

1 

1 


Irushal- 

Wallis 


1 

1 

1 

1 


Narn- 
Wh i tney 


• Mann 
! Vhi tney 


Aloha 


! 


0.05 


1 

l 


.05 


1 

1 


.05 


Test Statistic 


i 

i 


2.57 


1 

I 


761.00 


1 

1 


821.5 


Critical Level 


i 

i 


.45 


1 

1 


.46 


1 

1 


.268 


Correlation 

Coefficient 


i 

i 

i 

• 


3 .03 


1 

1 

! 


NA 


1 

1 

i 

i 


NA 


** = Significant 


at 


stated level 


cf signif 


icance 



123 



F. P5YCHGL0C-I CAI CHARACTERISTICS 



1 . Hypotheses 



a.. H 0 : Anxiety will not aii'ect the recognition 

accuracy of a user. 

H, : Anxiety will affect the total error rate of 

a user. 



fc. E 0 : The cccperat iveness of a speaker will not 

affect his/her total error rate. 

H ( : Speaker cooperativeness will affect 

recognition accuracy. 



c. H 0 : The occurrence of recognition errors will 

not affect overall recognition accuracy. 

H , : A speaker's overall error rate will he 

affected by the psychological influence of 
rris- and ncn-recogni tions . 



d. H 0 : A speaker's beliefs in voice technology as 

a time saving job aid will net affect 
recognition accuracy. 

H , : The attitude a person possesses toward the 

influence of voice on a computer onerator's 
job and their willingness to use voice 
because of this influence will affect 
recognition accuracy. 



e. H Q : The attitude a speaker has about computers 

and information processing will have no 
psychological affect on recognition 
accuracy . 

H ( : A speaker's psycho logica 1 attitude 

concerning automation and data processing 
will affect recognition accuracy. 

2 . Psychological Anxiety 

The results of the State-Trait Anxiety Inventory are 
depicted graphically in Figures 24 to 26. Figures 24 and 25 



124 



show sore indication that individuals with a lower state 
anxiety acquired fewer errors. The relationship between 
error rate and trait anxiety, shown in Figure 26, depicts a 
more randomized occurrence of error rates. Correlation 
analysis substantiates this in that state anxiety during 
week #1 is statistically significant with week #2 showing 
sore positive correlation but net significant at the .65 
level. There is no significant positive correlation between 
trait anxiety end error rates. 

The obtained STAI scores yielded a normal 
distribution and equal sample sizes of high and low anxiety 
users. With the basic assumptions for use of a parametric 
test met, a two sample t-test was used to detect differences 
between groups. Additionally, the ncn-parametric tfann- 
fchitney test was applied for purposes of further 
verification, however it does net possess the power of its 
parametric counterpart. Results of the analysis are 
included in Table XXVII. 

In all cases using non-paramet ric analysis the null 
hypothesis caDnot be rejected, although the critical level 
shows the test statistic to be just within the acceptance 
region. The dichotomy in the trait anxiety analysis is 
interesting; the more powerful parametric test allows the 
rejection cf the null hypothesis whereas the opposite exists 



125 



ERRORS (*) 

20.0 + 



15.0 + 



10.0 + 



5.0 + 



* 

* 



* 4 c ** 
* 



** * 
4c 



* 

* * 
** 

* 



* * 



* * 
* 



4 c 4 c 



0.0 + 



* 

** 



20 . 



— *• — + + -- 

30. 4 0. 50. 

SXAI 27 ALOATI OS 



50 . 



- + 

70. 



Risnre 24. i v e?r Error Rate vs. State Anxiety (Wee! t #1) 



ERRORS (*) 
20.0 + 



— 4c 




15.0 + 




* 

- * 




10.0+ * 


* 


— 


4c 4c 




4c 4c 4c * 4c 


4c 


4c 


- 4C 


4c 4c4c 4c 


5.0+ * * 


4c 44c 4c 4c 


4c * 


4c 4c 


4c 4c 


4c 4c 


4c 


4c 


— 


4c 4c 


5.0 + 




+ — 




20. 


30. 4 0. 




STAX : 



* * 



• - +- 

5 0 . 



■— + ■ 

60 , 



— + 

70 . 



"i?nre 25. rear 2rrcr Rate vs. State Anxiety (WeeJc #2) 



126 



ERRORS (%) 

16.0 + 



12.0 + 



8.0 + 



4.0 + 



0.0 + 

18.0 



* 

** * 

* 

* * * * 

*2 * 
* * * 

* * 

* * 

* *2 * * 

* * 

*2 * 

* * 

* 



— + * - + -• 

35.0 45.0 54.0 

3 1 AI EVALUATION 



27.0 

Figure 26. Mean Error Hare vs. Trail Anxiety 



63.0 



using the Mann-Whi tney . In both instances though, the test 
statistic lies extremely close to that point separating the 
acceptance and critical regions. 

The affect cue to anxiety tray be considered as 
inconclusive because of the resultant statistical analysis. 
Although showing significant correlation in Week #1, any 
anxiety in Week #2 may have been overcome or masked ty 
familiarity and experience with equipment anc procedures. 
Ey Week #3 ana the administration c? the Trait inventory, 
subjects were thoroughly versed in the experimental 
procedure. The inconsistent results nevertheless, leave 
reason to believe that anxiety has an affect on speech and 
hence recognition accuracy, but the decree to which it dees 
remains a clcuaed issue. 



127 



AE k’ECT ON RECOGNITION ACCURACY DUE TO ANXIETY 



+ 

1 






1 


1 




i 


1 




I 






1 


+ 

1 


i 

1 






1 

I 


1 

• 

1 




1 

1 SI 


1 

i 




i 

l 






1 

1 


1 

i 


1 






1 


1 




• 


i 


CD 


1 






1 


1 


1 


6- 




! 


W 1 




1 7i 


i 


0- 


1 






1 


i 


1 


MM 




1 


c 1 




1 r-4 


i 


S> 


i 






l 


1 


1 


t— i 


lO 


i *o 


M> | 




1 ^ 


i 


• 


i 






1 


i 


1 


X 




1 MY 


TH | 


si 


1 


i 




l 




.O 


1 


1 


1 






1 OJ 


■G 1 


S3 


1 \ 


1 




1 




SI 


i 


l 


1 


X 


X 


1 M> 


3* 1 


• 


1 


1 




1 




c-l 


1 


1 


1 




MM 


I : 


1 i 


a 


i cm 






1 




• 


i 


1 


1 


E-* 


X 


1 40 


C 1 




1 <C 


i 


CD 


1 






i 


1 


1 




3e 


i 


G 1 




\ * sa 


i 




1 






l CU 


1 


1 


x 




1 


rC 1 




i * • 


1 


SI 


1 






1 o 


1 


1 






1 


2- J 




1 CM 




• 


1 






l M 


1 


1 


£-* 




1 


1 




i i 


l 




1 






1 03 


I 


1 






1 


! 




i 






1 






1 O 


1 


1 






1 


1 




i 


1 




1 






1 


i 


-r 
























+ V4 


1 


1 






1 


1 




i 


1 




1 






1 I-* 


1 


1 






l 


1 




i 


l 




1 






1 c 


1 


1 






1 


I 




i it) 


l 




1 






1 $c 


i 


1 






1 


>* 1 






i 


CM 


1 






1 -H 


i 


1 


£~* 




1 


ai t 




1 <s> 


1 


CD 


1 






1 Ml 


1 


1 


X 




1 


G t 




1 CM 


1 


Si 


1 






i 


1 


1 


*— i 


CM 


1 Ms 


M> | 




1 ^ 


l 


• 


1 






1 *4 


1 


1 


X 




1 MY 


•*4 1 




1 


1 




1 




cO 


1 o 


i 


1 


z 




1 <U 


x: i 


si 


1 \ 


i 

» 




1 




i—i 


! 


i 


1 


X 


X 


1 M> 


3 t 


• 


1 


1 




1 




<— i 


1 —4 


i 


1 




X 


1 1 


t ! 


Sl 


1 CO 


1 


J3 


1 




• 


1 CU 


1 


1 


» 


X 


1 M* 


G i 




1 CO 


l 


CO 


1 






1 > 


1 


I 


c-» 


at 


1 


C 1 




1 i-H 


1 


CD 


1 






1 OJ 


1 


1 


X 




1 

« 


OJ 1 




! • 


1 


CM 


1 






I -4 


i 


1 


c-i 




1 


2 : i 




! —4 


\ 


• 


1 






1 


I 


1 


CO 




1 


i 




1 J 


1 




1 






l on 


1 


1 






1 


i 




1 


1 




1 






1 OJ 


1 


1 






1 


i 




J 


l 




1 






1 «o 


i 


-r 
























+ 03 


1 


1 






1 


i 




I 


1 




1 






1 40 


I 


1 






1 


l 




1 


1 




1 






1 Ml 


1 


I 






i 


1 




t a ; 


l 


S 3 


1 






1 


1 


1 


>~ 




1 


>s 1 






l 


SI 


1 






1 40 


1 


1 


c-' 




1 


a t 




i o 


1 


CD 


1 




* 


1 03 


1 


1 


a 




1 


G 1 




l 30 


1 


S3 


1 




* 


1 


i 


1 


>— i 


r -4 


1 M> 


<a | 




1 CO 


1 


• 


1 






1 40 


l 


1 


X 


% 


1 Ml 


v4 1 




1 


1 




1 




CD 


1 C 


1 


1 


Am. 




1 D 


XX \ 


SI 


1 \ 


1 


\ 


l 




CM 


I 03 


1 


1 


x 


X 


1 40 


31 1 


t 


1 


1 




1 




.O 


1 CJ 


i 


1 




04 


1 1 


i I 


<s> 


1 CO 


1 


CD 


1 




• 


1 * 4-4 


1 


1 


MM 


X 


1 MS 


a i 




1 «— i 


1 


CD 


1 






1 — 1 


< 


1 


5-4 




1 


a i 




1 to 


i 


33 


1 






i '**4 


i 


1 


x 




1 


"3 1 




| • 


i 


cH 


I 






i G 


1 


1 


5 -h 




1 


2 : i 




I «— * 


i 


• 


1 






1 CD 


1 


1 


CO 




1 


i 




t 1 


i 




1 






1 •*-» 


i 


1 

i 






1 

1 


i 

i 




1 

1 


I 

i 




1 

1 






1 CO 
1 


i 

1 


1 

T 

1 


— 


— 


1 

1 


i 




i 


i 




1 






*r II 

t 


1 

i 


I 

l 






1 

1 


i 

l 




i 

1 


i 

i 




t 

i 






1 

l 


1 


i 






1 


i 




1 


i 




1 






1 >5- 


i 


i 






1 


l 




1 


» 


— - 


1 






i 


i 


i 






1 M> 






1 




<U 


i 


w 


♦o 


€ 


i 


i 






1 MY 


j 




1 w 


i 


> 


1 


O 


C 


1 


1 


i 






1 Os 


l 




l — 


i 


V 


1 


■^4 


CD 


1 


1 


i 






1 6-4 


1 


03 


t 40 


i 


M 4 


1 


40 


•C 4 


1 


i 


i 






1 


i 


X3 


t MS <i) 


t 




1 


03 


W 


l 


1 


i 






1 4-1 


l 


G- 


1 M> -T 4 


i 


-4 


1 


-4 


4 -t 


1 


1 


i 






1 O 


i 


i— i 


1 CU M> 


i 


03 


1 


OJ 


4-1 


1 


i 


i 






I 


1 


X 


1 &4 (TJ 


i 


» > 
w 


1 


G 


4-1 


1 


\ 


l 






1 (L 


1 




1 *o 


i 


m -4 


l 


Mi 


a t 


1 


i 


i 






1 Mh 


l 




1 CO 


i 


40 


1 


O 


o 


1 


1 


i 






1 JX 


1 




1 


i 


—-4 


1 


CD 


CD 


1 


1 


l 






1 e* 


1 




1 


i 


G 


1 






1 


1 


i 

1 






1 

1 


l 

I 




1 

1 


i 

i 


O 


I 

1 






1 

1 


1 

1 


l 

T 






1 


1 




I 


i 




1 






1 


I 

T- 



128 



w • 



Speaker Cocperativ e n e s s 



Subjects evaluated their degree of cooperativeness 
on an interval scale with subsequent creation of the 
following groups. 

a. Less than cooperative speakers 

b. Moderately cooperative speakers 

c. Verv cooperative speakers 

d. Extremely cooperative speakers (subjects who 
marked the 'anchor point' of the scale) 

The results cf the analysis are presented in Table XXVIII. 
with mean error rates graphically represented in Tigure 27. 
The null hypothesis Is rejected due to a test statistic 
greater than ihe Chi-square value of 7. £15. Multiple 
comparisons among the groups reflect a r existent difference 
between the 'less than cooperative' and 'extremely 

cooperative' speakers only. Despite indication of some 

correlation between high cooperativer.ess and low error rate, 
the computed coefficient is not significant at a .?5 level 
(Critical Level: 0.095). 

These results led to a further analysis from a 
perspective of speaker participation. That is, did the 
subject like participating in this type of experimentation 
and if sc, could it be correlated to total error rate? 
Their subjective responses resulted in the creation of three 
generic groups as fellows: 



129 



y .0 





8.0 - i 




7.0 -i 




6.0 - ! 


MEAN % 


£.0 


ERROR 




RATE 


4.0 -i 




3.0 -j 




2.0 -! 




1.0 -i 



(e.y) 



Less 



(6.43) 



(6.27) 



(5.1) 



Mode ra te 



Very 



Ext rerrely 



Figure 27. f'ean Error Rate vs. Speaker Cooperativeness 



TABLE XXVIII 

AFFECT CF SPEAKER COOPERATION AND PARTICIPATION 



i 

1 

1 


1 

1 


COOPERATIVENESS 


1 

1 


PARTICIPATION 


1 

1 

1 


! Type of Test 
1 


1 

1 


ErusKal-Wal 1 i s 


1 

1 


Kruskal-Walli s 


1 

1 

l 

1 


1 “ ' “ ' ” " 

! Alufca 

i 


1 

l 


.05 


1 


.05 


1 

1 

l 

1 


i 

Test 

Statistic 

1 


1 

1 

1 

1 


16.82 ** 


1 

! 


4 .76 


1 

1 

1 

1 

1 

1 


i 

! Critical 

Level 

1 


1 

1 

1 

1 


< .005 


| 


.095 


1 

1 

1 

1 

1 

1 


i 

| Correlation 

! Coefficient 
1 


1 

1 

1 

1 


-.226 


i 

i 

i 


+.278 ** 


1 

1 

1 

1 

1 

1 


1 , _ _ 

** = Significant at stated 


level 


of significance 


1 

1 

1 



a. These who don't care 

b. Persons who like to participate 

c. Persons who strongly like to participate 

In this instance the attainment of a positive correlation 
indicating that those who liked to participate acquire 
higher error rates is counter-intuitive. The cull cannot be 
rejected based on the computed test statistic given in Table 
XXI/III. A correlation of .636 between subject responses to 
cooperativeness and participation is net as large as was 
expected and as such could, in part, have led to the 
divergent results. Whether these results are due to willing 
participants trying too hard to perform veil and thus, 
having greater than usual mis- or non-recognitions is 
unclear. 

4 . Pecognition Errors 

Subjects responded to two Questions, one pertaining 
to their feelings at the time of a mi s-recogni t ion and the 
other pertaining to their feelings ever a rcn-reccgniticn 
(beep). Their responses to these two questions were 
averaged to represent how they felt toward the occurrence of 
an error and this led to the creetior of two distinct 
groups; those who don't like an error to occur and those who 
feel they are not disturbed or tothered by an error. The 
results of the analysis are summarized in Table XXIX. 



131 



TABLE XXIX 





AFFECT 


OF 


RECOGNITION ERRORS 




! ! ERRORS ! 


1 

1 

l 

1 , 


Type of Test 


1 

1 


Mann-Whitney 


1 

I 

1 


1 

1 

1 

1 


Alpha 


1 

1 


0 .05 


1 

I 


I 

1 

1 

1 


Test Statistic 


1 


612.50 


1 

! 


1 

1 

1 

1 


Critical Level 


1 

! 


.0897 


I 

1 


1 

1 

! 

1 

1 

1 


Correlation 

Coefficient 


1 

1 

1 

1 


-0.225 


! 

1 

1 

1 

I 


1 

1 

t 

4- — - 


** = Significant 


3 t 


stated level of signif ican.ce 


! 

1 

I 



The null hypothesis cannot te rejected ana although the 
negative correlation coefficient indicates that those who 
dislike errors tend to have higher error rates, it is net 
significant at an alpha of .05 (Critical Level: .07). 

5. Attitudes Toward the Use cf Voice 

Cuestions 4, 6 and 8 of User Cuest i onna i r e ft 2 were 
used to rreasure the speaker's attitudes toward voice 
technology. The results (Table XXX) indicate a 
statistically significant correlation between high error 
rates and a favoratle attitude tcvard voice recognition as a 
rreans of saving tirre and reducing the burden on a computer 
operator. Scatter plcts of responses tc these questions and 
associated error rates are depicted in Figures 28-30. 
Multiple cctrparisons between the groups shewed differences 
between those who would always use vcice and those who wcrld 



132 



- 1 * + 







1 

I 




i 

l 




t 

t 


1 

| 




l 

i 




1 




1 

j 








1 




i 

l 


\n 


i 

i 


l 




i 




i 




l 


j 






l 




l 


-r-» 


1 


1 




1 




1 


* 


l 








l 


CG 


1 


— < 


1 


l 


* 


l 




1 


* 


l 








i 




i 


— * 


i 


i 


* 


l 




l 




l 








1 




1 


03 


1 


l 




l 




1 


to 


1 








1 


2 


l 


J* 


\ 


u; l 




1 


Cvi 


1 


ST* 


l 








l 


O 


1 


I 


1 


S3 i 


0- 


1 


S3 


l 


CO 


l 








1 


>— t 


l 


H 


1 


• i 


• 


i 


• 


l 


• 


1 








I 


CH 


1 


03 


i 


S3 » 




1 




1 




1 








1 


CO 


1 


X 


1 


1 




i 




i 




l 








l 


w 


1 


c0 


1 


1 




i. 




i 




1 <L> 








1 




1 


G 


1 


1 








i 




l C 








1 


O' 


1 


u 


1 


1 




j 




i 




1 £3 






r=H 


1 




1 


UP 


1 


1 




i 




i 




1 03 






CJ 


1 




1 




1 


1 




i 




i 




l CJ 






>— » 


\ 




1 




1 






\ 




i 




| <r<4 






o 


-+■ 






















+ Vl 






>► 


l 




1 




1 


\ 








i 




t -*-4 








l 




1 




1 


t 




! 




l 




1 G 






v*< 


t 




l 


CO 


l 


1 




i 




i 




t Ctf 






o 


1 




1 


-«~4 


1 


1 




i 




i 


a* 


1 **H 








1 


CD 


l 


— < 


l 


I 


* 


i 




i 


» 


l cO 






&w 


1 




l 


—4 


i 

i 


1 


* 


i 




i 




1 






CO 


1 




i 


03 


l 


1 




i 




i 


c^- 


1 






G3 


l 


zz 


i 


3* 


l 


dJ I 




i 


CVi 


i 


:0 


1 o 








1 


o 


1 


1 


l 


S3 1 


CV 


! 


S3 


l 




1 






o 


1 


1 — \ 


t 


— i 


1 


• j 


• 


j 


• 


l 


• 


1 — < 






£■» 


1 


c~* 


i 


‘T3 


1 


S3 1 


CJ3 


i 




i 




l CD 








1 


CO 


t 


-X 


t 


1 




i 




l 




1 > 






O 


i 


Pd 


l 




l 


1 




1 




i 




i aj 








i 


ZD 


i 


G 


l 


1 




1 




I 




i x 






M 


i 


O 


i 


g 


1 


1 




i 




I 




I 






z 


i 




i 


UP 


i 


1 




i 




i 




1 TJ 








i 




i 




1 


1 




1 




i 




1 CD 




X 


<< 


i 




i 




1 






i 




i 




1 ^ 




X 


E-» 


+ 






















+ 03 




X 


CX 


i 




i 




l 


1 




i 




i 




1 f-» 






W 


• 




t 




1 


1 




1 




I 




1 ui 




Pu 


P- 


i 




i 


ui 


1 


l 




1 




i 




1 








i 




I 


f-H 


1 


1 




1 




I 


a* 


1 <-> 






co 


t 


v r 


i 


—4 


1 


l 




1 




i 


a* 


l 03 




<; 


h*3 


i 




i 


— * 


1 


1 




1 




i 




1 






c; 


1 




i 


03 


1 


1 




1 




i 


Cd 


l M 






=3 


1 


Z 


i 




1 


ii) r 1 


03 


t 


o 


i 


fsi 


1 G 








1 


o 


l 


l 


1 


S3 i 




l 


S3 


i 




l 03 






H-» 


l 


J— t 


i 


—4 


; 


• j 


• 


1 


• 


i 


• 


1 CJ 






5-* 


l 


2H 


1 


03 


l 


S3 1 


O 


l 




i 




1 ••*“4 


i 




cH 


l 


•O 


i 


-US 


1 


1 




i 




i 




1 Vh 






«*: 


1 




I 


lD 


l 


1 




J 




i 




i — • 








1 


O 


i 


3 


l 


1 




1 




i 




1 G 






o 


l 


o 


i 


u 


j 


j 




1 




i 




l 






c-t 


1 




i 


X 


i 


1 




1 




i 




1 -*-« 








1 




i 




1 


1 




1 




i 




1 'O 






h*3 


1 




i 




i 


1 




1 




i 




1 






G3 


T 






















-P il 








1 

I 




i 

i 




i 

l 


1 

l 




1 

1 




i 

t 




I 

l X- 


• 




£-» 


1 




i 




k 


1 




l 




i 




l fr 


, 




CJ 


l 




i 




1 


1 




1 


i—* 


i 




1 






W 


1 




i 




i 








0-1 


1 £2 




i 


* 




f*4 


1 




i 


If) 


1 


i 


CJ 


i 


.> 


i O 


s 


l 






P*4 


1 




i 


CD 


1 


i 


4 


i 


a j 


( 


CD 


1 








1 




i 


S-*" 


' 1 


03 1 




i 


H- 1 


1 G 


«*-» 


t 








1 




i 




1 


XS 1 ^ 


iP 


i 




1 03 


y 


1 








1 




i 




1 


ZU 1 cf) 


—4 


i 


4 


1 — ♦ 


H-l 


1 








1 




i 


O 


1 


r-4 | 0J 


M 


i 


03 


l <D 




i 








l 




i 




1 


<; i e- 


03 


i 


CJ 


1 P* 




l 








l 




i 


0J 


1 


i 


*■* 


i 


-r-i 


i u 


CD 


1 








1 




i 


f-4 


l 


i 


CO 


i 




I o 


O 


i 








1 




i 


>» 


1 


i 




i 


•r*4 


1 o 


o 


l 








1 




i 


£“• 


1 


i 




i 


f*4 


} 




l 








i 

1 

l 




i 

i 

i 




1 

1 

t 


I 

> 

i 

i 




i 

i 

i 


o 


1 

t 

l 




l 

l 

t 





1 

MU. 



ERROR {%) 
16*0 + 



12.0 + 



8.0 + 



4.0 + 



2 

c *4c 
4c4c * 

* 4c 

* 

* 

** * 
* * 

* 4c 
4c * 



* 

* ♦ * 
* 

* 

* 

4c * 

4c 



0.0 + 



+- 

0. 



20 . 



■- + - 

4 0, 



■ - + — 

50 . 



•- + -■ 

30. 



100 , 



figure 28. Scatter PLot: Mean Error Pace vs. Question #4 



ERRORS (S) 
16 .0 + 



12.0 + 



3.0 + 



4.0 + 



0.0 + 



4c 

4c 

4c 



*4c 

£ 

* 

4c 

4c 



* 

4c 

* 4c 



4c 

3 * 

2 

4c 

4c 

4c 4c 4c 

2 * 

4c 4c 



2o7 



40, 



-- +- 

5 0 , 



- +- 

30, 



100 , 



120 . 



figure 29. Scatter Plot: Mean Error Pate vs. Question #6 



134 



4c 4c 



* 



* 

* * 2 * 

* * * * 

* * * 

** * 4c 

* 4c4c 

* 2 

4c * * 4c * 4c 

4c 4c 4c 
4c 4c 

4c 4c 4c 

4c 4c 

4c 

— + — — — — — — — — + — — — — — + — — — — — ■— * — — ~ + — — —— — — — ——4* 

40- 6 0. 30. 100. 120. 



Figure 32. Scarier Piet: Pean Error Rate vs. Question #8 



seldom use it despite its pronounced advantages, and between 



those who felt that the advantages of voice will give the 



Keyboard operator other jobs and those who disagree with 



such an attitude. Therefore, the null hypothesis cannot be 
rejected in terras cf a speaker's attitude concerning the 
influence cn a data processor's job due to voice 

recognition. Cn the other hana, a speaker's willingness to 



use voice recognition oecause of his/her beliefs in its 



requisite advantages will affect error rates. 



As was noted earlier, the presence cf a positive 



correlation appears to be contrary to popular belief. One 



would imagine that a user who believes voice recognition can 



rrake the jot of a computer operator easier (Question #4), 



1 'Z G, 

w 



ERRORS (%) 
16. 0 + 

12.0 + 

3.0 + 

4.0 + 

* 

3.0 + 

20 . 



would tend toward better recognition accuracy. Questions 
sii and eight were asked for the purpose of determining if 
a user's error rate might he influenced hy the subconscious 
thought of encumbering additional duties because of the 
efficiency and effectiveness of voice input. But, despite 
the possibility cf additional tasks, potential users still 
would prefer voice to manual entry. However, the presence 
of a significant positive correlation may only be attributed 
to the uniqueness of the situation; ie. as in speaker 
participation subjects who professed a strong desire to use 
voice regardless of consequences may have tried too hard for 
high accuracy and as a result have failed to speak in a 
'na tural ' manner . 



6 . Attitude Toward Computers end Information Processing 
In response tc two sets cf questions, subjects 
provided their attitudes surrounding the necessity of 
computers in todays society and how voice technology would 
aid information processing cr data input. Attitudes towards 
computers fell into three general categories. 

a. Persons who feel computers are unnecessary. 

b. Persons that feel computers are necessary in 
society, but are not a panacea for all problems. 

c. These who feel that computers are an absolute 
necessity. 

Attitudes toward voice recognition and information 

processing resulted in four categories. 



a. These believing that voice would take more tirre 
for information or data processing. 

b. Those with no opinion. 

c. Those who feel voice will save some time 

a. Those who feel voice can save immeasurable time 
compared to conventional methods cf data entry 
and information processing. 

Results of the analysis are summarized in Table XXXI. Eased 
on these results, the null hypothesis cannot be rejected and 
thus, it may be concluded that the opinion or attitude a 
person possesses towards computers, and thei~ feelings 
pertaining to voice as a time saving advantage will not 
affect their recognition accuracy. 



TABLE XXXI 

AFFECT DUE TO ATTITUDES TOWARD COMPUTERS 
AND DATA PROCESSING 





1 

1 


COMPUTERS 


1 


DATA PROCESSING 


Type of Test 


1 

1 


Kruska 1-Wal li s 


1 


Kruska 1-Wa Ills 


Alpha 


1 

1 


.25 




.£5 


Test 

Statistic 


\ 

1 

1 

1 


.78 


! 


CO 

tO 

to 

1 1 


Cri tical 
Level 


1 

1 

1 

1 


> .8 


1 

i 


.15 


Correlation 

Coefficient 


1 

1 

1 

1 


.111 


I 

i 

i 


-.164 



** = Significant at stated level of significance 



13? 



G . VOCAEULARY ERRORS 



As a result of using different nurrbers of syllables in 
the vocabulary, it was also possible to get art indication of 
how well utterances with different nurrbers of syllables were 
recognized. Originally done in a longitudinal study lEef. 
24: pp . 9-10] it is analyzed within the context of this 
document as further verification of these earlier results. 
This is shewn by weeks in figure 31 and ever all conditions 
in Figure 22. Both figures illustrate a generally declining 
error rate as a function of the number of syllables in the 
utterance. Although the current experimentation yielded an 
approximately 1.5 percent rise in error rate from three tc 
four syllables, it is not a large deviation from the earlier 
study which indicated little change in error rates between 
three or four syllables words. 

In terms of overall effectiveness, a practical 
application would dictate the least amount cf recognition 
errors. Therefore, an error rate of 5.91% still remains two 
to three percent better than utterances with a smaller 
syllabic content. Despite the higher rate for four syllable 
compared tc five syllable words, the difference is still 
less than that of one to four or two to four syllables. The 
variety cf vocabulary items used in this experiment further 
confirms the argument that through a careful and judicious 
selection cf vocabulary items, large vocabulary difficulties 
and associated high error rates may be reduced. 



128 



ie . e 




0 . 0 




i i i i 

2 3 4 >5 

# Of SYLLAEIES IN UTTERANCE 



Eigure 21. I*ean Error Rate vs. # Syllables (by Week) 



I3y 



10 .e 




# OF SYLLABIES IN UTTERANCE 



Figure 32. Mean Error Rate vs. # Syllables 

(Overall ) 



140 



VI. CONCLUSIONS 



following the lengthy elaboration of results in the 
previous section it would be helpful to recapitulate, in a 
brief surrrrary fcrrr, the responses of the different variables 
tested. Variables resulting in a statist icaily significant 
test statistic included: 

-- Method of training 

— Experience of the user 

— Previous computer experience 

-- Level of education (all subjects) 

— Vital capacity 

— Speaker c ccpera t i ven es s 

The following variables produced a significant 
correlation between itself anc recognition error rate. 

— Previous corrputer experience 

— Tine of the week 

— Experience of the user 

— Level of education (all subjects) 

— Speaker participation 

-- Vital capacity 

— Rate of air flow 

— State anxiety (first week only) 

-- User attitudes pertaining to voice 



141 



The following variables resulted in either a non 
ii'icant test statistic and/or correlation coefficient. 

Jot function 
Branch of service 
Job sati sfaction 
Service satisfaction 
Foreign language competency 
Time of day 

Time of week (test statistic only) 

Ease of use of voice equipment 
Level of education (naive users) 

Socio-economic class 

Cental care 

Race 

Marital status and family size 

Religious preference 

Accent 

Place cf birtn/gec graphic origin 
Age 

Height and weight 

Rate of airflow (test statistic) 

Physical conditioning/speech training 

Anxiety: State and Trait 

Speaker ccoperat iveness (correlation) 

Speaker participation (test statistic) 



-- Affect of recognition errors 
— Attitudes toward computers/data processing 

The vide range in error rates, .50 tc 15.7 percent, for 
the individual subjects (See Appendix J for a complete 
summary) indicates er obvious variability between subjects. 
Within the context of the main experiment and the associated 
ANOVA, the three variables of job function, training method, 
and experience (trials), are independent events and are 
protected from confounding due tc the experimental design. 
The selection of a level of significance equal to .05 is 
merely to shew a possible existence of so^e effect, rot to 
demonstrate a rigorous test of a stated hypothesis. As the 
analysis progresses tc the extraction of numerous ether 
human factors, these protections and the accompanying power 
of a parametric test are reduced. In some instances an 
awareress of a possible dependence between conaiticns is 
necessary prior to reaching an ultimate conclusion. ?cr 
example, were those subsets of a category achieving 
statistical significance also trained with supervision 
and/cr experienced users and if sc, how many were in that 
particular subset? 

The results presented herein suggest that speaker 
variability would not affect recognition accuracy to such an 
extent as to preclude its use among only specially selected 
users. ror implementation in military applications, this 
proves to be especially satisfying since it would negate the 



143 



services from the necessity of classifying personnel into 
particular trilitary occupational specialties or 
subspecialties for the express purpose of operating voice 
equipment. It is apparent from the experimentation, and the 
diversity of skills and experience contained within the 
sample population, that practically anyone may be a 
potential candidate to operate voice recognition equipment. 

The phrase 'practically anyone' should be qualified 
here. Interspeaker variability had a significant impact in 
the case of one subject, who possessed a severe speech 
impairment? stuttering. It becare obvious in the ea ~iy 
stages of Training that be would te unable to finish the 
training phase. In fact, after 3C minutes, only il 
utterances had been satisfactorily placed into memory. 
Although the individual was eliminated as an experimental 
subject, his difficulty demonstrates that although most 
anyone can use this type of technology, there will always 
exist those, albeit few in number, who for one exception or 
another are unable to attain a suitable level of recognition 
accuracy . 

The current experimentation has clearly shewn that, 
experience and rethoa of training voice equipment can 
provide excellent recognition accuracy rates. Of course, 
what determines an 'excellent' rate is purely subjective and 
determinate upen the application in which emplaced. Vhat 
makes this observation readily appealing is that both 



144 



rot 



cbaracteri sties are controlled by the human . They are 
factors that one Is born with cr has inherited. Bather, 
with closely supervised training procedures, ty an 
experienced operator, a 'naive' user can quickly attain 
recognition rates greater than 95 percent and with 
repetitive experience increase this accuracy until errors 
are reduced to less than two percent. It must be reiterated 
that in the present experiment, subjects were net allowed to 
retrain the recognizer during the three weeks of recognition 
testing. In actuality, the speaker would retrain an 
utterance rather than to continue incurring mis- cr r.on- 
reccgnition errors. 

To a lesser degree, speaker eccperativeness and amount 
of previous computer experience are definitely factor? to be 
considered. The latter characteristic influences the 
personnel selection process while speaker cooperative ness, 
like training ano experience, can be influenced by the; human 
element. Certainly, because of data processing experience, 
such individuals can readily identify with the advantages of 
speech input and thereby become a more or highly cooperative 
speaker. Thus combined, these two factors strongly support 
the potential for achieving high recognition accuracy. 

The presence of occasional positive correlation 
coefficients, that were statistically significant, are 
difficult to explain or resolve conclusively. Such 
instances as level of participation, desire to use voice, 



and attitudes pertaining tc voice, provided misleading 
results. It was surmised that speakers who ere willing 
participants and find voice to ce a technology that they 
would likely use, would achieve low error rates. The 
observation to the contrary, supposes that many of those 
speakers tried too hard for perfect recognition accuracy, 
and as a result, were less apt tc speak naturally. In 
effect, they were trying to outsmart the machine. 

Thus, in an operational environment it becomes incumbent 
upon both the speaker and the supervisor to fully embrace 
the concept of voice technology for use in a practical 
application. In demonstrations at the Naval Postgraduate 
School it is frequently noted that observers are genuinely 
impressed with the capabilities of voice input of data until 
that one error, sometimes after more than 2Z0 successfully 
recognized uttterances, occurs and they sit back and remark 
that perhaps "additional research is needed trier tc placing 
it into operational use". it :.s obvious that voice 
technology is acceptable for use in a military commend 
center and must be fully .supported by the Commander and his 
Staff. If it is, error rates can be minimized by human 
controls such as training and experience. In conclusion, 
consistency may best lescribe the key to speaker 
variability. Attitudes, training, and experience together, 
produce consistency in speech and consistency generates a 
continued high recognition accuracy rate. 



146 



APPENDIX A 



USEP QUESTIONNAIRE #1 



NAPE: SUBJECT# : 

INSTRUCTIONS : 

The purpose of this questionaire is to obtain information 
from you regarding physical characteristics , personal 
background, and opinions pertaining to voice recognition 
equipment and its use. Your answers will assist in 
determining whether personal and/or physiological traits 
contribute to effective utilization of voice recognition 
equ ipme.o t . 

The questions include multiple choice, YES/NC, ratine scale 
and short answer (one or two words ONLY!) types. 
Appropriate guidance accompanies each question or block of 
questions . 

Your name is NOT required but is requested in orler to ease 
the necessary correlation cf your replies with your results 
in the experimentation. If you desire anonymity, please 
respond with your subject number only. Please respond 
truthfully. Check your questionaire after completion tc 
insure you've completed all the questions. 

Thank-you for your assistance in this experiment. 



147 



In questions 1 - 22 , provide either a oDe or two word 
response, or place an X' by the appropriate answer. 



1. What is your age? 

2 . What is your height (in inches)? 

I:. What is your weight? 

4. What is your race? 

White (Caucasian) 

Yeiiow (As ian/Mongcioid ) 

ElacK (Negroid/African) 

Bed (American Indian) 

what is ycur nationality? 

Native Citizen of the United States 
Naturalized Citizen of the United States 
Alien 

5. What is ycur religious preference? 

(See Attached Sheet) 

7 . What is ycur ethnic background? 

Fuertc Rican 
7 ilipino 
Mexican 
Cuban 

Latin American (persons frcrr Central or S. America) 

Other Hispanic Descent (Extraction not delineated 
as Mexican, Puerto Rican, Cuban or Latin American) 



146 



Isicirro 



A leu t 
Indian 
Melanes ian 
Chinese 
Japanese 
Korean 
Polynesian 
V ietnarrese 

Cther Asian Descent (Extraction net delineated a 
Chinese, Japanese, Korean, Indian, Pilipino, or 
V ietnarrese ) 

None of the A cove 

Other (Please specify ) 



S. Do you have an accent? 

YES (what Kind? ) 

NO 



y. What is your Marital Status 
Married 
Divorced 
Single 

Other (separated, widowed) 



10. Hew rrany children do you have? 
0 
1 



149 



>4 

11. Do you wear glasses? 

YES 

NC 

12. Have you ever had orthodontist care E/or wear/worn 
braces? 

YES 

NO 

13. What is your level of education? 

Non High School Graduate 

High School Graduate 
Associate Degree 

1 year of college 

2 years of college 

3 years of college 

4 years of college (no degree) 

College graduate (BA/ES^ 

Graduate work: of more than 1 year (no degree) 
Masters Degree received 
Doctorate Degree received 

14. What state were you born in? 

15. During ages 1-18, in what state did you principally 
res ide? 



150 



16. What has teen your state of residence for the majority 
of the last three years? 

17. Co you speaK any foreign language(s)? 

YES [which one ( s ) 1 

NO 

16. What is your branch of service? 

Navy 

Army 

Narine Corps 
Air Force 
Cther ( civilian ) 

19. How many years have you been ir the service? 

20. Eave ycu ever been overseas for mere than 13 
consecutive months? (not including leave or vacation') 

YES (gc tc question #21) 

NO (go to question #22) 

21. How many months were you oversees? 

In what country? 

22. What do ycu consider to be your socioeconomic class? 

lower Class 
Upper Lower Class 
lower Niddle Class 
Middle Class 
Upper Middle Class 



151 



O M M i t W rr c-t »h 

»h — 0 > (D l* tl 



lever U m e r Class 
Upper Class 



questions 22 - e £ place an ' X ' cn a point on 
at test indicates or iescrines your feelings, 
placed anywhere along the scale. 



the scale 
The ' I ' -nay 



How do you reel at< 



the 






c position you currently 



ve? 



i 



71 VZPY II7Z 

MUCH 



N2UTRAI 



DISLIII 




SI I 



vp 

i o 



Hcv Tuch s 
the Arred Se 



tisfacticr do yen derive frcT teing a •"errrer 
vices? 



I 



I 

I 



I 

I 



7I- V SATIS? HE ZCHZ T ? 1. 1 ?.*Z UNSATISZIIZ VIZ 

SATIS; 1:1 U N SATIS? 



!£. Computers are necessary in today's society. 



:zc zriDiT 

lG?ZI 



C T“ Tr^TT 

AGREE 



NC OPINION 
ION 'T ENC* 



S I IG5T1T 
CIS* C° 1 I 



DECIDE 

PIS ! C- ? 



:c t? 



Ecv would 7cice recogniticn rake a corputer 



Z l C uvJi S 



MUCH 
IAS HR 



SCMH’*HAT 
IAS HR 



NC CFINICN 



V CRI 



* > TVT Tn T 1 T 
i; i x r - uuL 



•CHI 

: iiicuit 



MUCH 



152 



M U« M kl 



27 . How would voice 

information processing or 

i i 


recognition 
data input? 

___ 1 


equ ipmeni 

i 


affect 

i 


i i 

SAVE A LOT SAVE SOME 

CE TIME TIME 


1 

NO OPINION 
DON'T KNOW 


I 

TAKES MORE 
TIME 


i 

TAKES A LOT 
MORE TIME 


2 £. If voice recognition can save time 
/.eycoard operator to dc other Jots. 

i i i 


. it would 


a i 1 cw a 

1 


i i 

DECIDEDLY SLIGHTLY 

AGREE AGREE 


1 

NO OPINION 
DON'T KNCte 


| 

SI IGKTIY 
DISAGREE 


1 

DECIDEDLY 

DISAGREE 



29. Descrits the use of voice recognition equipment. 



I 

I 



I 



VERY EASY 
TO USE 



A5Y TC 
USE 



NO OPINION 



DIFFICULT VEST 

TC USE DIFFICULT 

TO USE 



2c. ana; go / cl think of voice recognition equipment for 
use in Military Corrrrar.a Centers? 



i 



i 

i 



v Z E Y 

PRACTICAL 



SCf'IVHAT 

PRACTICAL 



NO CFIMCN SOMEWHAT VERY 

DON'T ENO’* IMPEACT ICAi IMPRACTICAL 



21. Row much previous computer experience have you had? 



ALOT CE CONS I DEBAR LI SOME VERY LITTLE NO 

EXPERIENCE EXPERIENCE EXPERIENCE EXPERIENCE EXPERIENCE 



is; 



3k: . What is ycur previous experience with voice recognition 
equipment? 



1 

I 



I 

J 



VERY MUCH MUCH 



SOME A IITTLE NONE 



33. how wouia additional experience with voice recognition 
e'.uipP:ent affect recognition accuracy? 



MUCH SOME NO OPINION A LITTLE NC 

IMPROVEMENT IMPROVEMENT IMPROVEMENT IMPROVEMENT 



2<t . Row qo you feel when a mi srecogni t i on occurs? 



i i i i 



STRONGLY 


1 

LIKE 


NEUTRAL 


1 

EISLIKE 


STRONGLY 


L tKE 








DISLIKE 



2.:. how do you feel when a non-recognition ('beep') occurs? 



I 



I 

! 



I I 

I 1 



STRONGLY LIKE 

LIKE 



NEUTRAL 



DISLIKE STRONGLY 
DISIIKE 



2c. how ao you feel when a recognition occurs? 



i 

STRONGLY 


i 

LIKE 


NEUTRAL 


! 

DISLIKE 


STRONGLY 


LIKE 








DISLIKE 



Describe your participation in this experiment. 



27 . 



EXTREMELY 


i 

MODERATELY 


i ~ ~ i 

COOPERATIVE SOMEWHAT 


1 

VERY 


COOPERATIVE 


COOPERATIVE 


UNCOOPERATIVE 


UNCOOP- 








ERATIVE 



2c. Eow would you describe your participating in this type 
of experimentation? 



1 

STRONGLY 


1 

I IKS 


i 

NEUTRAL 


i 

DISLIKE 


1 

STRONGLY 


LIKE 








DISLIKE 



29. W'nat is your current physical condition? 

i i i i i 

i i i i i 

OUTSTANDING GOOD AVERAGE FAIR POOR 



£0. If voice recognition does save time and allows YOU to 
be assigned other tasks, how often would YOU want to use it? 



i _ _ i i i i 

i i i i i 



ALWAYS 



EREQUENTLY 



NOW AN D THEN 



SELDOM 



NEVER 



APPENDIX £ 



USER QUESTIONNAIRE #2 



NAME: SUBJECTS: 

INSTRUCTIONS : 

Toe purpose of this questionaire is tc obtain is format ion 
from you regarding physical characteristics, personal 
ca ckgrounn , and opinions pertaining to voice recognition 
equipment and its use. Your answers will assist in 
determining whether personal and/or physiological traits 
contribute tc effective utilization of voice recognition 
eqv ipmen t . 

The questions include multiple choice, YES/NG, rating scale' 
and short answer ione or two words ONLY!) types. 
Appropriate guidance accorpanies each question or clock ct 
questions. 



Ycur name is NOT required but is requested in order to ease 
tne necessary correlation cf ycur replies with your results 
in the experimentation. If you desire anonymity, please 
respond with ycur subject number only. Please respond 
truthfully. ChecK ycur questionnaire after completion tc 
insure you've completed ali the questions. 

Tnan>c-you for your assistance in this experiment. 



156 



Ia questions 1 - 2, pro vine either a oDe or two wora 

response, cr place an 'I' by the appropriate answer. 

1. Have you ever had one or more of the following speech 
impediments aud/or impairments? 

Articulation (difficulty ir pronouncing vowels 
and/or consonants) 

Voice (irregularities in the larynx) 

Cleft lip ana/or lip palate 
Ceretral palsy 
Stuttering 
Hearing impairments 



Aphasia 

Congenital speech defects (due to Dir th/pregnancy) 

Retardation 

None of the above 

2. Have you ever received speech therapy from either a 
subsidized (free) clinic, private speech therapist, cr 
through the public school system? 

YES 

NC 

3. have you ever received voice training or taKen singing 
lessons ? 

YES (Hew many years? ) 

NC 



127 



In questions 4 ~ 15 place an 'X ' on a point on the scale 
that best indicates or describes your feelings. The 'X' r ray 
be placed anywhere along the scale. 



4. hew would voice recognition neKe a corrputer operator's 
jot? 



MUCH 


i 

SOMEWHAT 


i 

NO OPINION 


1 

['•'ORE 


1 

MUCH MORE 


EASIER 


EAS IZH 




DIFFICULT 


DIFFICULT 


5 . How 


would voice 


recogni tion 


equipment 


affect 


intern ation 


processing or 


data input? 







1 

SAVE A LCT 
OF TIME 


SAVE SOME 
TIME 


i 

NO OPINION 
DON'T KNOW 


I 

TAKES MORE 
TIME 


i 

TAKES A LCT 
M;ORE TIME 


6. If voice recognition 
Keyboard operator to do o 

1 i 


can save time, it would 
■; h e r jobs. 

! I 


allow a 

1 


i 

DECIDEDLY 

AGREE 


1 

snc-H'in 

A G- R L i. 


i 

NO OPINION 
DON'T KNOW 


SLIGHTLY 

DISAGREE 


i 

DECIDEDLY 

DISAGREE 


7. Descri 
1 


te the use of 

i 


voice recognition equipment 

i _ __ _ i _ 


1 


1 

VERY EASY 
TO USE 


EASY TO 
USE 


NO OPINION DIFFICULT 

TO USE 


VERY 

DIFFICULT 
TO USE 



158 



8. If voice recognition does save tirre and allows YOU to 
be assigned other tasKs, how often would TCU want tc use it? 



ALWAYS 



FREQUENTLY 



NOW AN I! THEN 



SELDOM 



NEVER 



8. How would additional experience with voice recognition 
equipment affect recognition accuracy? 



1 

MUCH 


i 

SOME 


NO OPINION 


1 

A LITTLE 


1 

NO 


IMPROVEMENT 


IMPROVEMENT 




IMPROVEMENT 


IMPROVEMENT 



ID. How do you feel when a mi srec cgDi t i on occurs? 



i i i i 

i 



i ~ 

STRONGLY 


I IKE 


NEUTRAL 


1 

DISLIKE 


STRONGLY 


LIKE 








EISLIKE 



11. How do you feel when a non-recognition ('beep') occurs? 



i 

STRONGLY 


~ ! ~ 

LIKE 


1 

NEUTRAL 


- , 

DISLIKE 


i 

STRONGLY 


LIKE 








EISLIKE 



159 



How ao you feel when a recognition occurs? 



12 . 



i i i i 



. 1 

STRONGLY 


LIKE 


i 

NEUTRAL 


DISLIKE 


1 

STRONGLY 


LIKE 








DISLIKE 


13. DfcscriDe 


your participation in this experiment. 

i i i 


i 


i " 

EXTREMELY 


i 

MODERATELY 


COOPERATIVE 


SOMEWHAT 


i 

VERY 


COOPERATIVE 


COOPERATIVE 




UNCOOPERATIVE 


UNCOCF 



ERATIVE 



14. How wouia you aescrioe your participating in this type 
of experimentation? 



i 

STRONGLY 


i 

I IKS 


i 

NEUTRAL 


1 

DISLIKE 


1 

STRONGLY 


LIKE 








DISLIKE 



lc. What ao you thinK of voice recognition equipment for 
use in Military Corrnanci Centers? 



i 

i 



i 



VERY SOMEWHAT 

PRACTICAL PRACTICAL 



NC OPINION 
DON 'T KNOW 



SOMEWHAT 

IMPRACTICAL 



VERY 

IMPRACTICAL 



164 ! 



APPENDIX C 



SZLi -ev ALU at ion questionnaire 

NAME EATE SUBJECT# 

DIRECTIONS: A number of statements which people have used 

to describe Themselves are given below. Read each statement 
and then circle the appropriate numcer to the right of the 
statement that indicates how you GENERALLY feel. There are 
nc right or wrong answers. Please dc not spend too much 
time on any one statement, but give the answer which seems 
to Qescrite how you GENERALLY feel. 

1 = ALMOST NEVER 

2 = SOMETIMES 

3 = 01 TEN 

4 = ALNOST ALWAYS 



1 . 


I 


feel 


pleasant 


1 


2 


X 


4 


2 . 


▼ 

1 


tire 


quickly 


1 


2 


r? 

o 


L. 


'i 

u • 


I 


feel 


like crying 


1 


2 


r z 

v_. 


u. 


4 . 


I wish 
ethers 


I could be as happy as 
seem tc be 


1 


2 


X 


4 


e; 


I 


am I 


osirg out on things because 


1 


2 




4 



I can't make up n.y n.md soon 
enough 



161 



4 



6. I feel rested 1 

?. I air "calm, cool, and collected" 1 

b . I feel that difficulties are 1 

piling up sc that I cannct 
overcome then. 

is. I v*orry too rruch over something 1 

that really doesn't matter 

10 . I arr happy 1 

11. I ax inclined to take things hard 1 

I'd . I lack seif confidence 1 

13. I feel secure l 

14. I try to avoid facing a crisis 1 

or difficulty 

1:. I fetl Blue 1 

16. I am content 1 

17. Some unimportant thought runs 1 

through my mind and bothers re 

lb. 1 take disappointments sc Keenly 1 
that I can't put them out of my 
mind 

iy. I am a steady person l 

c0. I get m a state of tension or 1 

turmoil as I think ever my recent 
concerns and interests 



4 



c 



4 



id 3 4 






4: 






4 



C 






£ 



4 



£ 



c. 



2 



•7 



4. 



c 



2 0 . 



2 



6 



3 4 






162 



SCORING m 
lor the 

A-THAIT IVA1UAT 



APPENDIX E 



SEli -EVALUATION CUES! ICNNAIRE 



NAME 



EATS SUBJECT# 



EIRECTIONS: A number of statements which people have used 
to aescrite themselves are given below. Read each sxaxerrenx 
end xhen circle xhe appropriate nurrber to the right of the 
statement that indicates how you feel RIGHT NOV — AT THIS 
VERY MOMENT. There are do right or wrong answers. Please 
do net spend xco iruch tiire on any one sxaxerrenx, tux give 
xhe answer that best describes your PRESENT feelings. 



1 = NOT AT ALL 

2 = SCMZVHAT 

3 = MODERATELY SO 

4 = VERY MUCH SC 



1 . I feel cairr 1 

2. I feel secure 1 

3. I err tense 1 

4. I art. regretful 1 

z. I feel ax ease 1 

6. I feel upset 1 

?. I am presently worrying 1 



over possible misfortunes 



2 3 4 
2 3 4 
2 3 4 
2 3 4 
2 3 4 



rt 



4 



3 



4 



16 * 



4 



s . 


I 


i eel 


res tea 


1 


a 


r Z 

o 


9. 


I 


reel 


anxious 


1 




3 


10 . 


T 


i eel 


com'or table 


1 


2 


'Z 


11. 


I 


feel 


self-confident 


l 


£~ 


rz 


ik . 


T 


feel 


nervous 


1 


2 


rz 


13. 


I 


air jittery 


1 


2 


rz 


14 . 


I 


feel 


"nign strung" 


1 


ZL 


*z 


15. 


I 


air relaxed 


1 


2 


*7 


16 . 


I 


feel 


content 


1 


c 


3 


17. 


I 


err worried 


1 


k 


r Z 

o 


16 . 


I feel cver-excited 
anc. "rattled” 


1 


c 


'Z 


19 . 


I 


feel 


joyful 


1 


2 


*7 

C/ 


k2 . 


I 


feel 


pleasant 


1 


'2 


*7 



4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 



165 



SCCBliNG KEY 
for the 

A-STATE EVALUATION 



1. 



4 . 
b . 
6 . 
7. 

e . 

u # 

10 . 
11 . 
12 . 
13. 

14 . 

15 . 
16. 
17 . 
lc . 
19. 
£0 . 



4 

4 

1 

1 

4 

1 

1 

4 

1 

4 

4 

1 

1 

1 

4 

4 

1 

1 

4 

4 



3 

3 

2 

2 

3 

2 

2 

3 

2 



2 



2 

2 

2 

3 

3 

4 
2 



2 



2 



1 

1 

4 

4 

1 

4 

4 

1 

4 

1 

1 

4 

4 

4 

1 

1 

4 

4 

1 

1 



iee 



APPENDIX I 



UTTERANCE LIST: TRAINING WEEK - 



WORD# 


UTTERANCE 


000 


THREE 


001 


EUROPE 


002 


MOVE IT LEET 


000. 


CARRIAGE RETURN 


004 


LOGOUT 


005 


COMMAND 


006 


STRAIT CE HORMUZ 


00? 


TINE 


006 


KOREA 


00b 


ZERO 


010 


CHANGE DIRECTORY TO PCCC 


011 


ALPHA 


012 


FOSITIVE 


013 


IDENTIFICATION 


014 


LAUNCH 


015 


RELOCATE 


016 


DELTA 


01? 


TASK FORCE CCMf' ANDER 


01b 


KILO 


01b 


LOGIN YELLIN' 


020 


ECHO 


021 


NOVEMBER 


022 


TWO 


022 


UNITED STATES 


024 


FOUR 


025 


BRAVO 


026 


PLACE A CIRCLE ON MOSCOW 


02? 


ENEMY DETECTION 


026 


PROCEED 


02 b 


ROMEO 


030 


FLIGHT CONTROLLER 


031 


SEVEN 


032 


GROUND CONTROL APPROACH 


033 


REPORT 


034 


AIRFIELD NAMiZ 


035 


LIMA 


036 


AVAILABLE 


03? 


MiESSAGE 


036 


SATELLITE 


03 b 


SHOOT 


040 


YANKEE 


041 


AFFIRMATIVE 



WEEK# 1 



CRT PROMPT 

TBREE 

EUROPE 

MOVE IT LEFT 
CARR RETURN 
LOGOUT 
COMMAND 
STH OF HMRZ 

time 

KOREA 

ZERO 

C DIR TO PK 

ALPHA 

POSITIVE 

UNIFICATION 

LAUNCH 

RELOCATE 

L EL TA. 

TSK 5RC CUR 
KILO 

LOGIN YZLLEN 
ECHO 

NOVEMBER 

TWO 

UNITED STS 
EC UR 
BRAVO 

PL A CIR MOS 

EN DETECTION 

PROCEED 

ROMEO 

ELT CTLR 

SEVEN 

GNE CTL APPR 
REPORT 
AELI NAME 
LIMA 

AVAILABLE 

MESSAGE 

SATELLITE 

SHOOT 

YANKEE 

AEE IRMATI Vi 



ie? 



042 


CHARLIE 


CHARLIE 


042 


TORPEDO 


TORPEDO 


044 


El VE 


FIVE 


045 


CFERAT IONS PLAN 


OPNS PLAN 


04 e 


OiiENSE 


Oil INS E 


04 V 


OP IN DETAIL 


UP IN DETAIL 


046 


NINE 


MINE 


04b 


PROBABILITY CP LETECTION 


PRCB OF BET N 


050 


NEUTRAL 


NEUTRAL 


051 


JULIETT 


JULIETT 


052 


SPEEB 


SPEED 


052 


UNIFORM 


UNIFORM' 


e 5*1 


SENSOR 


SENSOR 


055 


TANGO 


TANGO 


056 


CLOSE CUT CHARLIE 


CDS OUT CHRL 


05? 


LOAD THE GANN 


ED THE GANN 


056 


OSCAR 


OSCAR 


05b 


NORTH ATLANTIC NAP 


N ATL MAP 


'060 


PACIFIC DATA BASE 


PAC DAT BASE 


061 


BUM AN FACTORS 


HUM FACTORS 


062 


FOXTROT 


FOXTROT 


062 


SOVIET 


SOVIET 


064 


DE i SNS E 


DIP SN SE 


065 


ONE 


CNF 


066 


INDIA 


INDIA 


06? 


ADVANTAGES 


ADVANTAGES 


06c 


GCIF 


ICLF 


06b 


CANCEL 


CANCEL 


0?0 


ZULU 


ZULU 


0 ? 1 


NEGATIVE 


\ EG ATI VS 


0?2 


PLCT ALL SUBMARINES 


PLT ALL SUBS 


e?2 


XRAY 


XRAY 


0?4 


REFUEL 


REFUEL 


0?5 


AUTOMATIC RECCGNITICM 


AUTO RECOG 


0?e 


QUEBEC 


QUEBEC 


0?7 


TRACK ENEMY 


TRACK ENEMY 


0?6 


LEVEL TWO 


LEVEL TWO 


07b 


COURSE 


COURSE 


080 


JOINT TASK FORCE 


JT TSK FRC 


061 


SIX 


S IX 


062 


WHISKEY 


WHISKEY 


062 


ATTACK 


ATTACK 


064 


SIERRA 


S IERRA 


065 


MANEUVER DELAY 


MNUVR DELAY 


066 


DISTANCE 


DISTANCE 


06? 


EXECUTE 


EXECUTE 


066 


SIGHT 


EIGHT 


08b 


V ICTOR 


VICTOR 


0b0 


MEDITERRANEAN MAP 


MED MAP 


0b 1 


SEA 01 JAPAN 


SEA OP JAPN 


0b2 


POPPA 


POPPA 


0b2 


FILE 1RANSEER PROTOCOL 


FI TNSFR FRO 



is 



0y4 


ALTITUDE 


ALTITUEE 


0y£ 


HOTEL 


HOTEL 


oye 


NUKE THEM TILL THEY GLCfc 


NUKE EM 


0y? 


ACCAT TITLE 


ACCAT TITLE 


09c 


MIKE 


MIKE 


0yy 


MISSILE 


MISSILE 



Icy 



APPENDIX F 



UTTERANCE 


LIST: WEEK #2 


WORD# 


UTTERANCE 


e00 

001 

0e2 

003 

004 
00 5 
006 
00 V 
006 
00b 
010 
011 
012 
016 
014 
01c 
01c 
017 
016 
eia 
020 
021 
022 
022 

024 

025 

026 
027 
026 
02 b 

030 

031 

032 

033 

034 

035 
026 
037 

036 
03b 

040 

041 


MISS IIE 
MIKE 

ACCAT TITLE 

NUKE THEM TILL THEY GLOW 

HOTEL 

ALTITUDE 

PILE TRANSFER PROTOCOL 
PCFPA 

SEA OF JAPAN 

MEDITERRANEAN MAP 

VICTOR 

SIGHT 

EXECUTE 

DISTANCE 

MANEUVER DELAY 

SIERRA 

ATTACK 

WHISKEY 

SIX 

JOINT TASK FORCE 
COURSE 
LEVEL TWO 
TRACK ENEMY 
QUEEEC 

AUTOMATIC RECOGNITION 
REFUEL 
X PAY 

PLOT ALL SUEMARINES 

NEGATIVE 

ZULU 

CANCEI 

GOLF 

ADVANTAGES 

INDIA 

ONE 

DEFENSE 

SOVIET 

FOXTROT 

HUMAN FACTORS 

PACIFIC DATA BASE 

NORTH ATLANTIC MAP 

OSCAR 



042 


LOAD THE GANN 


043 


CLOSE OUT CHARLIE 


044 


TANGO 


045 


SENSOR 


046 


UN 11 OHM 


047 


SPEED 


048 


JULIETT 


04b 


NEUTRAL 


050 


PROBABILITY CE DETECTION 


051 


NINE 


052 


UP IN DETAIL 


05c 


CFEENSE 


054 


OPERATIONS FLAN 


055 


FIVE 


05c 


TORPEDO 


057 


CHARLIE 


058 


AFFIRMATIVE 


35b 


YANKEE 


060 


SHOOT 


061 


SATELLITE 


062 


MESSAGE 


063 


AVAILABLE 


064 


LIMA 


065 


airfield name 


066 


REPORT 


067 


GROUND CONTROL APPROACH 


06c 


SEVEN 


069 


FLIGHT CONTROLLER 


070 


ROMEO 


071 


PROCEED 


072 


ENEMY DETECTION 


073 


PLACE A CIRCLE ON MOSCOW 


074 


RRAVC 


075 


FOUR 


076 


UNITED STATES 


077 


TWO 


078 


NOVEMBER 


07b 


ECHO 


080 


LOGIN YELLEN 


081 


KILO 


082 


TASK FORCE COMMANDER 


082 


DELTA 


084 


RELOCATE 


085 


LAUNCH 


086 


IDENTIFICATION 


087 


POSITIVE 


088 


ALFHA 


08b 


CHANGE DIRECTORY TO FOOCK 


090 


ZERO 


091 


KOREA 


092 


TIMS 


093 


STRAIT OF HORMUZ 



171 



ey4 


command 




LOGOUT 


096 


CARRIAGE RETURN 


097 


MOVE IT LEFT 


09t 


EURCEE 


099 


THREE 



172 



APPENDIX G 



UTTERANCE LIST : WEEK #3 



WORD# 


UTTERANCE 


000 


CARRIAGE RETURN 


001 


STRAIT CE HORN'D Z 


00k: 


ZERC 


003 


POSITIVE 


004 


RELOCATE 


005 


KILO 


eoe 


NOVEMBER 


007 


EOUR 


006 


ENEMY DETECTION 


00b 


FLIGHT CONTROLLER 


010 


REFCRT 


011 


AVAILABLE 


012 


SECCT 


015 


CHARLIE 


014 


OPERATIONS PLAN 


015 


NINE 


016 


JULIETT 


017 


SENSOR 


016 


LOAD THE GANN 


019 


PACIFIC DATA BASE 


020 


SOVIET 


021 


INDIA 


022 


CANCEL 


023 


PLOT ALL SUBMARINES 


024 


AUTOMATIC RECOGNITION 


025 


LEVEL TWO 


026 


SIX 


02? 


SIERRA 


026 


EXECUTE 


029 


MEDITERRANEAN MAP 


030 


FILE TRANSFER PROTOCOL 


031 


NUKE THEM TILL THEY GLOW 


032 


MISSILE 


033 


MOVE IT LEFT 


034 


COMMAND 


035 


KOREA 


036 


ALPb A 


037 


LAUNCH 


036 


TASK FORCE COMMANDER 


039 


ECHO 


040 


UNITED STATES 


041 


PLACE A CIRCLE ON MOSCOW 



173 



041 


ROMEO 


042 


GROUND CONTROL APPROACH 


044 


LIMA 


245 


SATELLITE 


046 


AFFIRMATIVE 


047 


IIVE 


046 


UP IN EETAIL 


04b 


NEUTRAL 


050 


UNIFORM 


051 


CLOSE OUT CHARLIE 


051. 


NORTH ATLANTIC MAP 


052 


EOXTROT 


054 


ONE 


056 


COLE 


056 


NEGATIVE 


057 


REFUEL 


056 


TRACK EMMY 


059 


JOINT TASK FORCE 


060 


ATTACK 


061 


LI STANCE 


061 


VICTOR 


062 


FOFPA 


064 


HOTEL 


065 


MIKE 


066 


EUROPE 


067 


LOGOUT 


066 


TIME 


069 


CHANGE DIRECTORY TO FOOCE 


070 


IDENTIFICATION 


071 


DELTA 


071 


LOGIN YSLLEN 


072 


THREE 


074 


TwO 


075 


FRA VC 


076 


PROCEED 


077 


SEVEN 


076 


AIRFIELD NAME 


079 


MESSAGE 


080 


YANKEE 


061 


TORPEDO 


061 


OFFENSE 


062 


PROBABILITY OF DETECTION 


064 


SPEED 


065 


TANGO 


086 


OSCAR 


087 


HUMAN FACTORS 


066 


DEFENSE 


089 


ADVANTAGES 


090 


ZULU 


091 


ARAY 


091 


QUEBEC 


092 


COURSE 



174 



0y4 


WHISKEY 


eys 


MANEUVER DELAY 


eye 


EIGHT 


ay? 


SEA OE JAPAN 


098 


ALTITUDE 


099 


ACCAT TITLE 



afpzneix a 



EATA COLLECTION k CRM 

NAM: SIX: M F SUBJECT #: 

RANK: EAY/TINE: [TRIALS 1-2] 

ITRIALS 3-4] 

[TRIALS 5-6] 

\*EIK# : 12 3 

MICROPHONE: EXPER IEN CEE NON-EXPER ISNCEE 

TRAINING: SUPERVISEE NCN-SUPERV IS EE 



UTTERANCE 



TRIAL # 



TEHEE 



EUROPE 



MC VS IT LEFT 



CARRIAGE return 
LOGOUT 



COMMAND 



STRAIT CE BCRNU 
TIME 



KCREA 



o r -n a 
L SLSXKj 




CHG EIR TO PCCCX 



5 



6 



176 



!al?ea 
i POSITIVE 
! IDENTIi ICATICN 
! LAUNCH 
i RELOCATE 
! LILIA 

ITASK ECRCS CMER 



jKILC 

! LOGIN Y ELLEN 
! ECHO 



NOV EMBER 



mi. r\ 

1 ii\j 



, UN I TEL STATES 
IPCUR 



| BRAVO 

i PL CIRCLE ON MOSCOW 



i ENEMY DETECTION 
IFROCEEL 



ROMEO 



j PLIGHT CONTROLLER 
■ SEVEN 

! GRNL CTRL APPROACH 
i REPORT 

iAIEilELI NAME 
I LIMA 



17? 



I AVAILABLE 



{MESSAGE 



I SATELLITE 



SHOOT 



YANKEE 



I AFFIRMATIVE 
i CHARLIE 



! TCRP'ELO 



iFIVE 



OPERATIONS PLAN 



! 0 P F EN S E 



i UP IN RETAIL 



i N INE 



PROS OE LET EOT I ON 



.NEUTRAL 



i JULIETT 



| c* nrrm 

i j r til/ 



UNIFORM 



I SEN SCR 



TANGO 



i CLOSE OUT CHARLIE 



ILCaL THE GANN 



OSCAR 



i NORTH ATLANTIC MAP 



PACIFIC LATA RASE 



176 



1 - -1 _ 1 __ 1 1 1 1 1 

i'tiUMAN i ACTORS ! j i | j i j 


! FOXTROT j 


1 

1 

1 


1 

1 

1 


i 

i 

i 


i 

i 

i 


i 

i 

i 




1 SOVIET 

1 _ __ _ 1 


1 

1 

1 

1 


S 

1 

1 


i 

i 

i 


i 

i 

i 


i 

i 

i 




1 DEFENSE j i i i i 1 j 


i ONE j 


1 

1 

1 

1 


1 

1 

1 

1 


i 

i 

i 

i _ 


i 

i 

i 


i 

i 

i 

i 




i INDIA I 

1 1 


1 

1 

1 

1 


1 

1 

1 

1 


i 

i 

i 

i 


i ~ 

i 

i 

i 


i 

i 

i 




j ADVANTAGES | 


1 

1 

1 

1 


1 

1 

1 

1 


" i 
i 
i 
i 


i ” 
i 
i 
i 


“ i 

i 

i 

i 




iGOLE i 

i i 


1 

1 

1 

1 


1 

1 

1 


i 

i 

i 


i 

i 

i 


i 

i 

i 




1 1 1 J 1 1 1 1 

I CANCEL I 1 I ! ! i ! 

1 1 I I III 1 


; zulu i 


1 

1 

1 

1 


1 

1 

1 

1 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 




i NEGATIVE ! 


1 

1 

1 

1 


1 

1 

1 


i 

i 

i 

.. i 


i 

i 

i 


L i 
i 
i 




! PLOT ALL SUBMARINES! 
1 1 


1 

1 

1 

1 


1 

1 

1 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 




IXRAY ! 

i i 


1 

1 

1 

1 


1 

1 

1 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 




i i i i i i i i 

iREEUEL j ! ! ! j ! ! 


i AUTO RECOGNITION 

> i 


1 

1 

1 

1 __ 


1 

i 


i 

i 

i 

_ i _ 


i 

i 

i 

i 


i 

i 

i 

l 




! QUEBEC j 

i i 


1 

1 

1 

1 


i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 




! TRACK ENEMY j 

i i 


1 

1 

1 


i 

i 

i 


~ i 
i 
i 

i _ _ 


i 

i 

i 

_ i 


i 

i 

i 

__ i 




i LEVEL TWO ! 

1 1 


1 

1 

1 

1 


i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 




! COURSE 1 


1 “ 
1 
1 
1 


\ 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 




i JOINT TASK FORCE ! 


~ 1 
1 
1 
1 


! * 

| 

i 

_ _ 1 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

— i 




IS IX ! 

1 1 


1 

1 

1 

1 


1 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 




i i i i i i i i 

! whiskey !!!!!!! 


i ' ' ' " ' “ ' 1 , ■ ■ ■ | i '■ 11 - 1 i ' 1 ' ■ ' - i ■ r i 

j ATTACK ! i i ! ! i i 


sierra i i i ! i i i 

• i i i i i i i 


! MANEUVER DELAY \ \ \ \ \ \ \ 

i , i i i i i _ i - i 

1 1 1 1 1 1 1 1 



1 J 

! DISTANCE ! 

' 1 


. j 

1 

1 

1 


1 

1 

1 

1 _ 


_ i 

i 

i 

i 


i 

i 

i 

i 

i 


_ [ 

i 

i 

i 


'.EXECUTE | 


1 

1 

1 

1 


1 

1 

1 

1 _ 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i _ 


! E IGhT i 

i i 


1 — - 
1 
1 
1 


1 

1 

1 

1 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i 


1 1 1 1 1 1 1 
! victor I, 1 , 1 !!! 

* __ i _ i i i i i 


j MEDI TEBRANEAN PAF \ | j \ | \ 


I SEA Oi JAPAN 1 ! ! ! ! ! 

i _ i __ i i i i i 


1 POPPA ! 


1 

1 

1 

I 


1 

1 

1 

. __ 1 


i 

i 

i 

. i _ 


i 

i 

i 

_ i _ 


i 

i 

i 

i 


i J HE TNSFR FHCTCCCI j 
1 1 


1 

1 

1 


1 

1 

1 

1 


i 

i 

i 

_ i 


i 

i 

i 

i 


i 

i 

i 

i 


! ATTITUDE ! 

I,, T— rnr 1 T 


1 

1 

_ _ 1 


1 

1 

1 

1 


i 

i 

i 

_ i 


i 

i 

i 

i 


i 

i 

i 

i 


! HOTEL j 


1 

1 

1 


1 

1 

1 

1 


i 

i 

i 

_ i 


i 

i 

i 

i 


i 

i 

i 

i 


i NUKE TILL THEY GLOW! 


1 

1 


1 

1 

1 

1 


i 

i 

i 

i 


i 

i 

i 

i 


i 

i 

i 

i __ _ 


j ACCAT TITLE j 


1 

1 

1 


1 

1 

1 

1 


i 

i 

i 

- 1 _ 


i 

i 

i 

i 


i 

i 

i 

i 


if* IKE ! 

« _ _ _ _ 1 


1 

1 

1 


1 

1 

1 

1 


i 

i 

i 

_ i 


i 

i 

i 

i 


i 

i 

i 

i 


i FISSILE ! 

1 1 


1 

1 


1 

1 

1 

1 

1 


i 

i 

i 

_ t 
i 


i 

i 

i 

i 

i 


i 

i 

i 

i 

i 


5 


DATA 


REDUCTION 








1 l 













I# NGN-HECC&NITIONS I 

i i _ 

j# riS-RICCGNITICiNS S 

i _ i 

i ; ' 

I# TOTAL ERRORS | 

i ! .. 



APPENDIX I 



MASTER LIST Of UTTERANCES 



1. ONE SELLABLE UTTERANCES (15) 

ONE 

TWO 

THREE 

JOUR 

ilVE 

SIX 

EIGHT 

NINE 

GOLI 

MIKE 

LAUNCH 

TIME 

SHOOT 

S.PEEL 

COURSE 



2. TWO SYLLABLE UTTERANCES (35) 

EUROPE 

LOGOUT 

ZERO 

SEVEN 

ALPHA 

BRAVO 

CHARLIE 

DELTA 

ECHO 

EOXTRCT 

HOTEL 

KILO 

LIMA 

OSCAR 

POPPA 

QUEBEC 

TANGO 

VICTOR 

WHISKEY 

XRAY 

YANKEE 



161 



ZULU 

COMMAND 

REPORT 

OFFENSE 

DEFENSE 

ATTACK 

PROCEED 

CANCEI 

MESSAGE 

DISTANCE 

NEUTRAL 

MISSILE 

SENSOR 

REFUEI 



THREE SYLLABLE UTTERANCES (20) 

MOVE IT LEFT 
SOVIET 

JOINT TASK FORCE 

NOVEMEER 

JULIET! 

ROMEO 

SIERRA 

INDIA 

UNIFORM 

KOREA 

NEGATIVE 

POSITIVE 

EXECUTE 

AIRFIELD NAME 

ALTITUDE 

RELOCATE 

LOAD THE GANN 

LEVEL T T *C 

SATELLITE 

TORPEDO 



FOUR SYLLABLE UTTERANCES UO 

CARRIAGE RETURN 
LOGIN YELLEN 
STRAIT OF HORMUZ 
UNITED STATES 
FLIGHT CONTROLLER 
AVAILABLE 
AFFIRMATIVE 
UP IN DETAIL 



CLOSE OUT CHARLE 
HUMAN FACTORS 
ADVANTAGES 
TRACK ENEMY 
SEA 01 JAPAN 
ACCAT TITLE 



UTTERANCES GREATER THAN OR EQUAL TO £ SYLLABLES (16) 

MANEUVER DELAY 
CHANGE DIRECTORY TO PCCCK 
IDENHEICATICN 
TASK FORCE CCMf- ANDER 
PLACE A CIRCLE ON MOSCOW 
GROUND CONTROL APPROACH 
ENEMY DETECTION 
NORTH ATLANTIC MAP 
MEDITERRANEAN MAP 
PROEAE I L 1TY CE DETECTION 
OPERATIONS PLAN 
PACIFIC DATA HASE 
PLOT ALL SUBMARINES 
AUTOMATIC RECOGNITION 
PILE TRANSFER PROTOCOL 
NUKE THEM TILL THEY GLOW 



1£2 



APPEND I X J 



INDIVIDUAL SUEjEuT RECOGNITION KATES 

The following are mean error rates for each subject 
participating in the experiment. The data is 
partitioned to mirror the groups established in the 
overall experimental design and are expressed in percent 
error . 



GROUP I 


GROUP II 


4. St 


13.11 


? .17 


9.22 


7 .St 


6 .89 


4 .St 


8.39 


9 .22 


5 . 22 


6 .44 


6.89 


6.23 


6 .72 


8 .06 


6.33 


1.61 


4 . 06 


2 .89 


2 .00 


2.61 


1 .67 



184 



GROUP III 


GROUP IV 


4.06 


10.11 


2.11 


15.17 


.50 


4.89 


6 ,y4 


15.72 


9 .28 


8.06 


4.32 


9.06 


t .72 


S .44 


8 .22 


6 . 28 


4.50 


2.39 


2 .94 


7 .11 


2 .61 


4.23 



165 



LIST CF REFERENCES 



1. Kryter, K.D., "Speech Communication", in Kinkade, R.G. 
ana. YanCott, E.P., Euman E ngine ering Guide to Equipment 
Design, pp. 162-223, McGraw-Hill, 197k;. 

2. Lod ding ton G.R. ana Schalk , t> T .E . , "speech Recognition: 
Turning Theory to Practice," USE Spectrum . pp. 26-32, 
September 1961. 

3. Klctt, D.H., "Review of the ARPA Speech Unaers tanning 
Project, " in Diicn, N.B. and Martin, T.B., Anjtorva^c 
Speech ana Speaker Recognition , IEEE Press, 1979. 

4. Lea, V. A... "Speech Recognition: Past, Present and Fu- 

ture," ir. Lea, k .A . , Trends in Speech Recognition , pp. 
£6-69, Prer.tice-Hall Inc., 1962. 

5. Lea, l*. A., Computer Recognition cf Speech . Speech Sci- 
ence Publications, 1962 

6. Martin, T.J., "practical Applications of Voice Input to 
Machines," in Dixon, N. R. and Martin, T. B., Automatic 
Speech and Speaker Recognition , IEEE Press, 1979. 

7. Ihite, George M . , "Speech Recognition: A Tutorial Over- 
view", Computer . pp . 40-63, hay 1976. 

6. Naval Postgraduate School Report NPS55-60-016, Sxpe rl- 
rents fr'iti: Voice Input for Command and Control: Using 

Voice r .a pu: to dera te a Distrituteq Computer Network . 
Cy G.K. Pccck, April 19 80. 

9. Naval Postgraduate School Report NPS56-S2-026 , Use of 
Voice Recognition Equipment With Stenographer Masks , by 
G.K. Poock, N.D. Schwalm, and S.F. Roland, October 
1962 . 

10. Naval Postgraduate Schccl Report NPS66-62-032 Trying 
for Speaker Independence in the Use of SoeaKer Depen- 
dent Voice Recognition Equipment . cy G.K. Poock, N.D. 
Scnwalrr, P.J. Martin, end E.E. Roland, December 1962. 

11. Chapams, A. and Ccnsman, R.P., Tne Effects of 10 Com- 
munication Modes on the Behaviour of Teams During Co- 
Operative ProDlem-Sol ving, ” I n ternat i ona 1 Journal Man- 
Mac^^p^^S^uc^es , Voi.c, Se^temter 1974 



18c 



12. Operating Instructions for Wireless Input, Threshold 
Technology Inc., 1977. 

13. Naval Postgraduate School Report NFS55-6I-016 , Effect 
of Operator Mental Loadi ng on Voice Reccgr. i tion System 
Performance , Dy J.W. Armstrong and G.h. foock, August 
1961 . 

14. Rothberg, Michael, "Applying Automatic Speech Recogni- 
tion tc Data Entry , Mini-Micro Systems , pp. 153-162, 
November i960 . 

15. 'Spoken Words Drive a Computer," Business Week , pp. 
36H-361 , 2 December 1972. 

1c. Seek, £., Hodge, D.C., and Neuterg, E.P., "An Assess- 
ment of the Technology of Automatic Speech Recognition 
for Military Applications," in Dixon, N.R. and Martin, 
T.3., Automa tic S peech and Sneaker Recognition . IEEE 
Press, 1»?9. 

17. Jay, G.T., An Experiment in Voice Data Entry for Im- 
agery Interpretation Reporting. Miasters Thesis. Naval 
to s tg rad.ua te Scnool, M.onterey, California, March 1981. 

16. Malarkey, T.R., An Investigation of the Am icat ions of 
Voice Inrut/Outout Technology in the COINS Network Con- 
trol Center . Masters Thesis, Naval Postgraduate School, 
M.onterey, California, 1962. 

19 Hutchingscc, R.D., New Horizons for human lectors in 
Design . M'.cGraw-Hi li , 1961. 

20. Robinson, A.I., "More People are Talking tc.. Computers 
as Speech Recognition Inters the Real World", S_cienc^, 
v. 203, pp . 634-626, 16 February 1979. 

21. Lee, W . a . , What Causes Speech Recognizers to Make Mis- 
takes . paper presented at £hcrt Course for Computer 
Recognition of Speech, Sunnyvale, California, €-7 De- 
cember 1962. 

22. Batcheiicr. M.P., Investigation of Parameters Affecting 
Voice Recognition Systems in C3 Systems, Masters 
Thesis, Naval Postgraduate Scnool, Monterey, Califor- 
nia, 1961. 

23. Spiel be rger, C.D., ANXIETY Current Trends in Theory and 
Re sea rch . pp. 23-46, v. 1, Academic Press Inc, 1972. 



167 



24. 



Naval Postgraduate School Report NPS55-81-013, A longi- 
tudinal Stud y of Compute r Voice Recognition Performa nce 
and V oca dh 1 arv Size, to y G.JL. Foccir, June 1981. 

25. Threshold £00 User's Manual, Threshola Technology Inc., 
1978. 

26. US Army Resea re h Institute for the Behavioral and So- 
cial Sciences, Questionnaire Construction Manual, 1976. 

27. Spieiberger, C.D., Gorsuch, R.L., and lushene, B.E., 
STAI Manual, Consulting Psychologists Press, Inc., 
1969. 

26. Eyan, 1. A., Joiner, B. L., and Ryan, ‘ B. i . , 

Stuaent Hanaircok . Duxbury Press, 1979. 

29. Brvning, J. I. and Sintz, B. I., Corrputatlonal Handbook: 
of Statistics . Scott, ioresman ana Company, 1968. 

CC. Conover, W . J., Practical N enrarametr ic Statistics . 'Al- 
ley , I960 . 

21. Ott, L. , An Introduction to Statistical Methods ana 
Data Analysis . Uuxpury Press. 1977. 



188 



INITIAL DISTRIBUTION LIST 



N c . 



Defense Technical Information center 
Cameron Station 
Alexandria, Virginia 22314 

Superintendent 
ATTN: Litrary, Cede 0142 
Naval Postgraduate School 
Monterey, California 93940 

Superintendent 

ATTN: Professor G. Poock, Code 55Pk 
Naval Postgraduate School 
Monterey, California 93940 

Superintendent 

ATTN: CDR C. hutchins USN , Code 55Eu 
Naval Postgraduate School 
Monterey, California 93940 

Superintendent 

ATTN: Professor D. Neil, Cooe 5£Ni 
Naval Postgraduate School 
Monterey, California 93940 

Superintendent 
ATTN: Code 012A 
Naval Postgraduate School 
Monterey, California 93940 

Superintendent 
ATTN: Code 39 
Naval Postgraduate School 
Monterey, California 93940 

IBM 

Communications Products Eivisicn 
ATTN: David . Davenport 

PO Box 12195 Dept D4S/B632 
Research Triangle Park, NC 27709 



1£9 



Copies 



86 



1 






1 



1 



1 



9. SINGER 1 

Link Simulation Systems Division 

ATTN: Dr. E. Scctt Baudhui s 

11602 Tech Roaa 

Silver Springs, i v aryiana 2090 4 

10. Naval Electronics Systems Center 1 

ATTN: Frank Deckeiman, code 612 

2511 Jet'i'erscn Davis Highway 
Arlington, Virginia 20262 

11. Chairman, C3 Acauenic Group 1 

ATTN: Broi’essor hicnael Sovereign, Cede 2y 

Naval Postgraduate School 
Nonterey, Ca 92940 

12. Director l 

National Security Agency 

ATTN: Ns Jeanne B. Kim, Cede H44 

Bort George INeade, haryland 2C756 

13. US Army War College 1 

Department ox' War Gaming 

ATTN: CBT(B) H. V . Ye lien 

Carlisle Barracks, Pennsylvania 17013 



190 



200 






Thesis 

Y345 Yellen 

c. 1 A preliminary analysis 

of human factors affect- 
ing the recognition acc- 
uracy of a discrete 
word recognizer for C3 

20 F£§y®^ ems - 3 3 4 3 1 



Thesis 
Y343 
c . 1 



201732 

Yellen 

A nreliminary analysis 
of human factors affect- 
ing the recognition acc- 
uracy of a discrete 
word recognizer for C3 



system. 



